DISTINCT vs GROUP BY

Question

Post reply

DISTINCT vs GROUP BY

amenjonathan

SSCrazy

Points: 2482
More actions
August 4, 2011 at 1:59 pm

#255308

There's a debate currently going on between myself and a colleague about whether DISTINCT or GROUP BY returns a recordset faster. Now I've read if the query does not have aggregation, the GROUP BY will be converted to DISTINCT by the optimizer, thereby making both queries identacle. But I've also read and observed a couple of years ago on 2008 that DISTINCT can indeed take longer than GROUP BY on very large queries without aggregation. A former colleague of mine explained this was because while a GROUP BY is done before a table join, DISTINCT is performed after all joins are complete, which is why on large sets with many joins DISTINCT will be slower. (I've heard this as well about JOIN criteria vs WHERE clause for filtering).
What's the real deal behind this? I can't find any good explanations as to the inner workings of the optimizer on DISTINCT vs GROUP BY.
-------------------------------------------------------------------------------------------------
My SQL Server Blog

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

You can't tell for all cases.

DISTINCT will always perform a SORT right before the SELECT operator in the execution plan.

GROUP BY could end up doing the same thing, but it highly depends on the query syntax and/or semantics, indexes/tables involved.

Often the answer is "it depends".

-- Gianluca Sartori

ChrisTaylor SSCarpal Tunnel Points: 4527 More actions · Answer 2

It does very much depend! A distinct won't always do a sort, i believe it will only do this if there isn't the relevant indexing on the columns selected.

As an example, if you run this on AdventureWorks2008R2:

SELECT DISTINCT OrganizationNode, OrganizationLevel

FROM HumanResources.Employee

SELECT OrganizationNode, OrganizationLevel

FROM HumanResources.Employee

GROUP BY OrganizationNode, OrganizationLevel

SELECT DISTINCT Gender

FROM HumanResources.Employee

SELECT Gender

FROM HumanResources.Employee

GROUP BY Gender

You can see the differences between running this type of query on columns that are indexed and those that aren't.

As far as i was aware (from what i've read), i thought they'd produce the same plan but i've not explored every possibility in this area. My suggestion is, if you have something specific in mind, test it both ways and see if there is actually any difference....

_________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

spaghettidba SSC Guru Points: 105732 More actions · Answer 3

Uh-oh! There's a campaign conflict on this thread...

Good luck for the award, Chris!

-- Gianluca Sartori

amenjonathan SSCrazy Points: 2482 More actions · Answer 4

Thanks guys. So DISTINCT in most cases will be faster if it can find an index, but GROUP BY can be faster where no index exists? Generally speaking?

-------------------------------------------------------------------------------------------------
My SQL Server Blog

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Ninja's_RGR'us

SSC Guru

Points: 294069

August 5, 2011 at 9:11 am

#1364421

It's ~~not~~ NOW 2 against 1 for Jeff. :w00t:

ChrisTaylor SSCarpal Tunnel Points: 4527 More actions · Answer 6

amenjonathan:

Sorry but didn't attach the sql plan for the script above. If you look they both give the same plan. I'm not saying that this is a given in all examples but from what i've ready and from my experience i tend to get the same results from either. What i was pointing out was that you don't always have a sort from a distinct. As Gianluca stated earlier, it all very much depends.

Gianluca Sartori:

thanks for the well wish

_________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

You have 1 easy way of voting. Take out that advertisement of yours.

If you keep posting here, just link to Jeff's vote instead. That's 1 massive vote you can do.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

amenjonathan (8/5/2011)
Thanks guys. So DISTINCT in most cases will be faster if it can find an index, but GROUP BY can be faster where no index exists? Generally speaking?

Assuming the optimizer doesn't time out the same request (by that I mean 2 queries that can only return the same exact data because they ask for exactly the same thing) should return the same execution plan.

That should cover you for +98% of the cases you can think of. I'm leaving the 2% out there because I'm sure there are rare edge cases.