The union statement probably isn't the most used statement in TSQL, but it is
useful. What's more interesting is that it's often used in a way that might not
return the results you would expect. If you're a TSQL master none of this will
be new to you, but for the rest of us - it's worth a few minutes review!
We'll start with a somewhat contrived example, let's say we want to invite 10
employees with the lastname of Adams and 10 customers with the last name of Smith
to our SomethingOrOther.com launch.All things being equal, most people expect
this statement to return 20 rows if there are 10 rows that match each separate
query - would you agree?
select top 10 firstname, lastname from person.contact where lastname='adams' union select top 10 firstname, lastname from dbo.customers where lastname='smith'
When I run each statement separately I get 10 rows each, as follows:
If I run the entire query, I only get back 18 rows:
If you look carefully you'll see that our first query for last name 'Smith'
return a duplicate row for Samantha and another for Denise. Union by design
removes duplicates from the final result set, even if the duplicates were within
a single statement as in our example. It can be very useful behavior, or
problematic if you're not expecting it! In this case are the two rows we
eliminated truly duplicate people, or more likely they have different ID's and
addresses, meaning we managed to exclude two people by accident.
If we change to use UNION ALL we get the expected behavior of returning 20
rows, duplicates and all.
Depending on our needs either could be correct. But it's so important and so
often misunderstood that I make it a standard practice to follow up any time I
see UNION to see if they know the difference. If they indeed meant UNION I add a
comment to make life easier for the next DBA, like thisL
select top 10 firstname, lastname from person.contact where lastname='adams' union--8/10/08 verified UNION is correct select top 10 firstname, lastname from dbo.customers where lastname='smith'
There's also a difference from a performance perspective. UNION ALL requires
little additional work besides running the combined queries, but just UNION
requires an additional step to remove the duplicates - and that could be
expensive, it depends on how many rows you're checking. The top query plan is
UNION, the bottom is UNION ALL. We can see that the UNION example includes a
SORT operator that changes the plan, but always check Profiler to see the actual
difference in cost.
So, there's a little trivia you can test your developers with, and it might
save you from an embarrassing mistake someday too! If you're new to UNION you
might want to take a look at both EXCEPT and INTERSECT, both were added to SQL
2005.
Visit my blog at
http://blogs.sqlservercentral.com/andy_warren/default.aspx