A dumb question about Distinct - I posted this in the wrong forum earlier by mistake

  • I have a table called DGUsers. This table has a primary key of UserID. I have another table called Orders that has a field called UserID. I need to get a list of the DG Users that have placed orders. It seemed pretty easy. I therefore wrote the following query which seems to work. Something was bothering me about the query so I started to poke around the internet. I read a little about multi column query problems using distinct. Now I am not sure if the query is correct. Can anyone tell me if my query is correct or not?

    Select Distinct(DG.UserID)

    , DG.UserName

    , DG.LastName

    , DG.FirstName

    From DGUsers DG

    Join Orders O

    On O.UserID = DG.UserID

    Order by DG.UserID

    Thanks

    ps. I originally posted this under the Sql server 2000 newbies forum by mistake. Sorry.

  • Try this...

    Select Distinct(DG.UserID)

    , DG.UserName

    , DG.LastName

    , DG.FirstName

    From DGUsers DG

    where DG.UserID in (select UserId from Orders)

    Order by DG.UserID

    Gail, I think I missed it completely. I removed the distinct from the second query now..

    The_SQL_DBA
    MCTS

    "Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."

  • Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this

    Select Distinct

    DG.UserID

    , DG.UserName

    , DG.LastName

    , (DG.FirstName)

    From DGUsers DG

    Inner Join Orders O On O.UserID = DG.UserID

    Order by DG.UserID

    Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.

    If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.

    SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

    FROM DGUsers DG

    WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)

    Or

    SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

    FROM DGUsers DG

    WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)

    Those two are completely equivalent.

    http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The_SQL_DBA (1/25/2010)


    Select Distinct(DG.UserID)

    , DG.UserName

    , DG.LastName

    , DG.FirstName

    From DGUsers DG

    where DG.UserID in (select distinct UserId from Orders)

    Order by DG.UserID

    Huh? That's senseless and worse than the original post.

    Firstly, you haven't fixed the 'distinct is not a function problem'.

    Secondly IN ignores duplicates. There's no difference between IN (1,2,3) and IN (1,1,1,1,2,3,3,3,3,3,3), so why force SQL to expend the cost of the Distinct?

    Third, once you remove the INNER JOIN, there's no reason for distinct in the outer query

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/25/2010)


    Distinct is not a function, it does not take parameters. What you have written is completely equivalent to this

    Select Distinct

    DG.UserID

    , DG.UserName

    , DG.LastName

    , (DG.FirstName)

    From DGUsers DG

    Inner Join Orders O On O.UserID = DG.UserID

    Order by DG.UserID

    Brackets are allowed around any column name, which is why your original syntax does not give any errors. Distinct applies to the sll columns and ensures that there are no duplicate rows.

    If all you're trying to do is find customers with orders, don't join the orders table in, just use EXISTS or IN.

    SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

    FROM DGUsers DG

    WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)

    Or

    SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

    FROM DGUsers DG

    WHERE EXISTS (SELECT 1 FROM Orders O WHERE O.UserID = DG.UserID)

    Those two are completely equivalent.

    http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    I actually thought my original post was not so bad. Boy was I wrong.

    I am not sure from the rest of the discussion which is the correct method to use. Since I am trying to find the DG Users who have placed orders is this the correct method to use?

    SELECT DG.UserID, DG.UserName, DG.LastName, DG.FirstName

    FROM DGUsers DG

    WHERE DG.UserID IN (SELECT O.UserID FROM Orders O)

    Can you explain to someone who is not a DBA why my original method is no good.

    Thanks

  • Either of the examples I gave you will do that, they are (as written) identical in operation and performance.

    My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving

    Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The_SQL_DBA (1/25/2010)


    Gail, I think I missed it completely. I removed the distinct from the second query now..

    Nope. You've fixed one out of 3.

    Why have you left the brackets around the first column? Sure, it's valid SQL, but it gives the incorrect impression that DISTINCT is a function. It is not.

    Why is the distinct still there? The duplicate customers would have been caused by joining to the orders table (assuming that customers can make more than one order). With the join gone, the need for the distinct is gone

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/25/2010)


    Either of the examples I gave you will do that, they are (as written) identical in operation and performance.

    My comments to The_SQL_DBA was not related to his use of IN, but to the other bad advice and practices he was giving

    Your original method creates 'duplicate' rows (by joining in the Orders table) and then uses distinct to get rid of the duplicates. If you wanted customers with their order details then what you originally had (excluding the Distinct) would be perfect. Since you only want customer data, but you want to limit it to only the ones that have orders, then you want to use one of the two where clause operators that checks that. Those would be EXISTS and IN

    That makes perfect sense. Thanks so much.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply