January 30, 2009 at 12:03 pm
It is my understanding that when a UNION is performed between two SELECT statements, it's internally running a DISTINCT. Is this correct?
I'd also like to know how to eliminate the sorting that needs to be performed when a UNION or DISTINCT is run. Would a clustered index on a like column eliminate it?
What if the clustered index was on an identity in two tables and I set the clustered index to include a field that both tables shared?
Thanks!
John
January 30, 2009 at 1:23 pm
It is my understanding that when a UNION is performed between two SELECT statements, it's internally running a DISTINCT. Is this correct?
Yes Internally sql will only return Unique values.
From BOL
UNION
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
I'd also like to know how to eliminate the sorting that needs to be performed when a UNION or DISTINCT is run. Would a clustered index on a like column eliminate it?
Unfortunatelly since Sql is creating a distinct list this happens by it self. If you require a specific Order I'd suggest placing an order by on the query itself.
What if the clustered index was on an identity in two tables and I set the clustered index to include a field that both tables shared?
I'm not sure what the result would be but Like I said use a specific order by to ensure it is sorted the way you want.
January 30, 2009 at 1:31 pm
Thanks for the reply, but it's not quite what I'm asking. If you add an ORDER BY clause and look at the execution plan, you will see there is a sort step. Same with the UNION clause, it needs to sort the data in order to merge it and remove the dup's. I'm just wondering if there's anyway I can guarantee a pre-sort (or what condition would determine a pre-sort) so the execution plan doesn't have to perform the sort step.
If you notice when you join sorted records, it'll do a merge join which is really fast, otherwise it'll do a nested loop or a hash join which is slower.
So my goal is to remove the sort required in the execution plan on a UNION clause because the SELECT's I'm UNIONing result in 3 million + records, it takes a long time 🙂 The sort steps account for over 50% of the execution cost and by eliminating that I think I can save up to 15-20 minutes of processing on this server.
I just want to know if removing the sort generated from the UNION is possible and under which circumstances is it possible.
Thanks,
John
January 30, 2009 at 1:42 pm
I am not aware of UNION performing a sort however it performs an implict Distinct by removing duplicates while UNION ALL leave the duplicates.
I also don't think you should use UNION with that amount of data because UNION is the slowest operator compared to INNER JOIN, SubQuery and OUTER JOIN.
Kind regards,
Gift Peddie
January 30, 2009 at 1:47 pm
Well, the issue is that the client needs to extract data from two tables and merge the resultsets together after filtering out some records. I've managed to remove other redundancies. There's no commonality between the resulting record sets so I don't believe a join would work in this case. I think my only option is to union them...
January 30, 2009 at 2:26 pm
Then you would probably be safe using the UNION ALL Statement, and I think this would eliminate the sort operator.
January 30, 2009 at 2:28 pm
Here is one from Microsoft that uses code while the second uses what I call a physical solution using the IGNORE_DUP_KEY in the create index definition.
http://support.microsoft.com/kb/139444
http://www.sqlservercentral.com/articles/Basic+Querying/deletingduplicaterows/1175/
Kind regards,
Gift Peddie
January 31, 2009 at 4:24 pm
I would use UNION ALL - and I also ask the rest of my team to comment anywhere they use UNION - just to make sure they didn't mean to use UNION ALL
February 2, 2009 at 6:15 am
John (1/30/2009)
Thanks for the reply, but it's not quite what I'm asking. If you add an ORDER BY clause and look at the execution plan, you will see there is a sort step. Same with the UNION clause, it needs to sort the data in order to merge it and remove the dup's. I'm just wondering if there's anyway I can guarantee a pre-sort (or what condition would determine a pre-sort) so the execution plan doesn't have to perform the sort step.If you notice when you join sorted records, it'll do a merge join which is really fast, otherwise it'll do a nested loop or a hash join which is slower.
So my goal is to remove the sort required in the execution plan on a UNION clause because the SELECT's I'm UNIONing result in 3 million + records, it takes a long time 🙂 The sort steps account for over 50% of the execution cost and by eliminating that I think I can save up to 15-20 minutes of processing on this server.
I just want to know if removing the sort generated from the UNION is possible and under which circumstances is it possible.
Thanks,
John
There's no way to guarantee order except through an ORDER BY clause. In this case, you're not going to get to use indexes, becuase it's ordering the combined data set. It's probably running a table spool operation to get the data together for ordering it. If you returned it as two data sets and combined it on the client side, it might, might, run faster.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply