A dumb question about Distinct - I posted this in the wrong forum earlier by mistake

Question

Post reply

A dumb question about Distinct - I posted this in the wrong forum earlier by mistake

meichner

SSCertifiable

Points: 5563
More actions
January 25, 2010 at 9:52 am

#386174

I have a table called DGUsers. This table has a primary key of UserID. I have another table called Orders that has a field called UserID. I need to get a list of the DG Users that have placed orders. It seemed pretty easy. I therefore wrote the following query which seems to work. Something was bothering me about the query so I started to poke around the internet. I read a little about multi column query problems using distinct. Now I am not sure if the query is correct. Can anyone tell me if my query is correct or not?
Select Distinct(DG.UserID)
, DG.UserName
, DG.LastName
, DG.FirstName
From DGUsers DG
Join Orders O
On O.UserID = DG.UserID
Order by DG.UserID
Thanks
ps. I originally posted this under the Sql server 2000 newbies forum by mistake. Sorry.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

The_SQL_DBA SSChampion Points: 10494 More actions · Answer 1

Try this...

Select Distinct(DG.UserID)

, DG.UserName

, DG.LastName

, DG.FirstName

From DGUsers DG

where DG.UserID in (select UserId from Orders)

Order by DG.UserID

Gail, I think I missed it completely. I removed the distinct from the second query now..

The_SQL_DBA
MCTS

"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this

Select Distinct

DG.UserID

, DG.UserName

, DG.LastName

, (DG.FirstName)

From DGUsers DG

Inner Join Orders O On O.UserID = DG.UserID

Order by DG.UserID

Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.

If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.

SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

FROM DGUsers DG

WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)

Or

SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

FROM DGUsers DG

WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)

Those two are completely equivalent.

http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

The_SQL_DBA (1/25/2010)
Select Distinct(DG.UserID) , DG.UserName , DG.LastName , DG.FirstName From DGUsers DG where DG.UserID in (select distinct UserId from Orders)
Order by DG.UserID

Huh? That's senseless and worse than the original post.

Firstly, you haven't fixed the 'distinct is not a function problem'.

Secondly IN ignores duplicates. There's no difference between IN (1,2,3) and IN (1,1,1,1,2,3,3,3,3,3,3), so why force SQL to expend the cost of the Distinct?

Third, once you remove the INNER JOIN, there's no reason for distinct in the outer query

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

meichner SSCertifiable Points: 5563 More actions · Answer 4

GilaMonster (1/25/2010)
Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this
Select Distinct
DG.UserID
, DG.UserName
, DG.LastName
, (DG.FirstName)
From DGUsers DG
Inner Join Orders O On O.UserID = DG.UserID
Order by DG.UserID
Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.
If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)
Or
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)
Those two are completely equivalent.
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

I actually thought my original post was not so bad. Boy was I wrong.

I am not sure from the rest of the discussion which is the correct method to use. Since I am trying to find the DG Users who have placed orders is this the correct method to use?

SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

FROM DGUsers DG

WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)

Can you explain to someone who is not a DBA why my original method is no good.

Thanks

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Either of the examples I gave you will do that, they are (as written) identical in operation and performance.

My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving

Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

The_SQL_DBA (1/25/2010)
Gail, I think I missed it completely. I removed the distinct from the second query now..

Nope. You've fixed one out of 3.

Why have you left the brackets around the first column? Sure, it's valid SQL, but it gives the incorrect impression that DISTINCT is a function. It is not.

Why is the distinct still there? The duplicate customers would have been caused by joining to the orders table (assuming that customers can make more than one order). With the join gone, the need for the distinct is gone

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

meichner SSCertifiable Points: 5563 More actions · Answer 7

GilaMonster (1/25/2010)
Either of the examples I gave you will do that, they are (as written) identical in operation and performance.
My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving
Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN

That makes perfect sense. Thanks so much.