January 25, 2010 at 9:52 am
I have a table called DGUsers. This table has a primary key of UserID. I have another table called Orders that has a field called UserID. I need to get a list of the DG Users that have placed orders. It seemed pretty easy. I therefore wrote the following query which seems to work. Something was bothering me about the query so I started to poke around the internet. I read a little about multi column query problems using distinct. Now I am not sure if the query is correct. Can anyone tell me if my query is correct or not?
Select Distinct(DG.UserID)
, DG.UserName
, DG.LastName
, DG.FirstName
From DGUsers DG
Join Orders O
On O.UserID = DG.UserID
Order by DG.UserID
Thanks
ps. I originally posted this under the Sql server 2000 newbies forum by mistake. Sorry.
January 25, 2010 at 10:02 am
Try this...
Select Distinct(DG.UserID)
, DG.UserName
, DG.LastName
, DG.FirstName
From DGUsers DG
where DG.UserID in (select UserId from Orders)
Order by DG.UserID
Gail, I think I missed it completely. I removed the distinct from the second query now..
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
January 25, 2010 at 10:02 am
Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this
Select Distinct
DG.UserID
, DG.UserName
, DG.LastName
, (DG.FirstName)
From DGUsers DG
Inner Join Orders O On O.UserID = DG.UserID
Order by DG.UserID
Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.
If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)
Or
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)
Those two are completely equivalent.
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 25, 2010 at 10:04 am
The_SQL_DBA (1/25/2010)
Select Distinct(DG.UserID)
, DG.UserName
, DG.LastName
, DG.FirstName
From DGUsers DG
where DG.UserID in (select distinct UserId from Orders)
Order by DG.UserID
Huh? That's senseless and worse than the original post.
Firstly, you haven't fixed the 'distinct is not a function problem'.
Secondly IN ignores duplicates. There's no difference between IN (1,2,3) and IN (1,1,1,1,2,3,3,3,3,3,3), so why force SQL to expend the cost of the Distinct?
Third, once you remove the INNER JOIN, there's no reason for distinct in the outer query
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 25, 2010 at 10:30 am
GilaMonster (1/25/2010)
Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this
Select Distinct
DG.UserID
, DG.UserName
, DG.LastName
, (DG.FirstName)
From DGUsers DG
Inner Join Orders O On O.UserID = DG.UserID
Order by DG.UserID
Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.
If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)
Or
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)
Those two are completely equivalent.
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
I actually thought my original post was not so bad. Boy was I wrong.
I am not sure from the rest of the discussion which is the correct method to use. Since I am trying to find the DG Users who have placed orders is this the correct method to use?
SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName
FROM DGUsers DG
WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)
Can you explain to someone who is not a DBA why my original method is no good.
Thanks
January 25, 2010 at 12:02 pm
Either of the examples I gave you will do that, they are (as written) identical in operation and performance.
My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving
Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 25, 2010 at 12:05 pm
The_SQL_DBA (1/25/2010)
Gail, I think I missed it completely. I removed the distinct from the second query now..
Nope. You've fixed one out of 3.
Why have you left the brackets around the first column? Sure, it's valid SQL, but it gives the incorrect impression that DISTINCT is a function. It is not.
Why is the distinct still there? The duplicate customers would have been caused by joining to the orders table (assuming that customers can make more than one order). With the join gone, the need for the distinct is gone
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 25, 2010 at 1:03 pm
GilaMonster (1/25/2010)
Either of the examples I gave you will do that, they are (as written) identical in operation and performance.My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving
Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN
That makes perfect sense. Thanks so much.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply