DISTINCT vs GROUP BY

  • There's a debate currently going on between myself and a colleague about whether DISTINCT or GROUP BY returns a recordset faster. Now I've read if the query does not have aggregation, the GROUP BY will be converted to DISTINCT by the optimizer, thereby making both queries identacle. But I've also read and observed a couple of years ago on 2008 that DISTINCT can indeed take longer than GROUP BY on very large queries without aggregation. A former colleague of mine explained this was because while a GROUP BY is done before a table join, DISTINCT is performed after all joins are complete, which is why on large sets with many joins DISTINCT will be slower. (I've heard this as well about JOIN criteria vs WHERE clause for filtering).

    What's the real deal behind this? I can't find any good explanations as to the inner workings of the optimizer on DISTINCT vs GROUP BY.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • You can't tell for all cases.

    DISTINCT will always perform a SORT right before the SELECT operator in the execution plan.

    GROUP BY could end up doing the same thing, but it highly depends on the query syntax and/or semantics, indexes/tables involved.

    Often the answer is "it depends".

    -- Gianluca Sartori

  • It does very much depend! A distinct won't always do a sort, i believe it will only do this if there isn't the relevant indexing on the columns selected.

    As an example, if you run this on AdventureWorks2008R2:

    SELECT DISTINCT OrganizationNode, OrganizationLevel

    FROM HumanResources.Employee

    SELECT OrganizationNode, OrganizationLevel

    FROM HumanResources.Employee

    GROUP BY OrganizationNode, OrganizationLevel

    SELECT DISTINCT Gender

    FROM HumanResources.Employee

    SELECT Gender

    FROM HumanResources.Employee

    GROUP BY Gender

    You can see the differences between running this type of query on columns that are indexed and those that aren't.

    As far as i was aware (from what i've read), i thought they'd produce the same plan but i've not explored every possibility in this area. My suggestion is, if you have something specific in mind, test it both ways and see if there is actually any difference....

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Uh-oh! There's a campaign conflict on this thread...

    Good luck for the award, Chris!

    -- Gianluca Sartori

  • Thanks guys. So DISTINCT in most cases will be faster if it can find an index, but GROUP BY can be faster where no index exists? Generally speaking?

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • It's not NOW 2 against 1 for Jeff. :w00t:

  • amenjonathan:

    Sorry but didn't attach the sql plan for the script above. If you look they both give the same plan. I'm not saying that this is a given in all examples but from what i've ready and from my experience i tend to get the same results from either. What i was pointing out was that you don't always have a sort from a distinct. As Gianluca stated earlier, it all very much depends.

    Gianluca Sartori:

    thanks for the well wish

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • You have 1 easy way of voting. Take out that advertisement of yours.

    If you keep posting here, just link to Jeff's vote instead. That's 1 massive vote you can do.

  • amenjonathan (8/5/2011)


    Thanks guys. So DISTINCT in most cases will be faster if it can find an index, but GROUP BY can be faster where no index exists? Generally speaking?

    Assuming the optimizer doesn't time out the same request (by that I mean 2 queries that can only return the same exact data because they ask for exactly the same thing) should return the same execution plan.

    That should cover you for +98% of the cases you can think of. I'm leaving the 2% out there because I'm sure there are rare edge cases.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply