March 7, 2012 at 10:02 pm
Comments posted to this topic are about the item Combining union and union all[/url]
Thank you for reading the discussion on my QotD. As you may have noticed, the explanation I gave was wrong, even though 'Option 3' was still the correct answer. Please read on to find the correct explanation.
(oops, I broke the link to the QotD by editing the opening post...)
March 7, 2012 at 10:02 pm
Nice 🙂
March 7, 2012 at 11:53 pm
Another great back to basics question. Thanks!
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 8, 2012 at 1:11 am
I get it right, but I disagree with explanation:
If at least one 'Union' is used, duplicates will be removed from the entire final result set, no matter where the 'Union' occurs
It depends on precedence or parathesis and this script demonstrates it:
-- return two row
select 1
UNION select 1
UNION ALL select 1
-- return one row
select 1
UNION ALL select 1
UNION select 1
-- return two row
select 1
UNION ALL (select 1
UNION select 1
)
March 8, 2012 at 1:23 am
You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:
To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.
That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.
March 8, 2012 at 1:34 am
R.P.Rozema (3/8/2012)
You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.
That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.
No, it depends by the position. Run the code in the previos post.
March 8, 2012 at 1:38 am
I see your point. I have to get back on this or maybe someone else sees what's wrong?
March 8, 2012 at 1:44 am
Explanation is wrong: you can get multiple rows even if you use UNION in the query.
You can try it using the tables in the question:
select col from #t2
union
select col from #t3
union all
select col from #t1
Results:
col
2
3
1
1
March 8, 2012 at 1:47 am
It's because of the order of operations: when written this way the UNION happens first and removes duplicates in t2 and t3, then the result is combined with t1 using UNION ALL which allows duplicates.
select col from #t2
union
select col from #t3
union all
select col from #t1
If you change round the UNION and UNION ALL above you get the result set without duplicates, because the last operation performed is a UNION.
select col from #t2
union all
select col from #t3
union
select col from #t1
March 8, 2012 at 1:48 am
Got it right and thought that I understood - now looking at the other posts I am slightly confused.
March 8, 2012 at 2:16 am
UNION queries are evaluated from left to right. If the last query contains duplicates and is preceded with UNION ALL, you will have duplicates in your result set.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 8, 2012 at 2:23 am
Thanks for the question - and once again great discussion afterwards - learnt from that!
March 8, 2012 at 2:23 am
tim.kay (3/8/2012)
Got it right and thought that I understood - now looking at the other posts I am slightly confused.
So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.
March 8, 2012 at 2:39 am
R.P.Rozema (3/8/2012)
tim.kay (3/8/2012)
Got it right and thought that I understood - now looking at the other posts I am slightly confused.So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.
Your test is wrong because of values used:
1, 2 and 3 are different and they are not affected by the DISTINCT of UNION.
You should use same values!
😉
March 8, 2012 at 2:52 am
To finalize this: My explanation is wrong and my question + answer was correct only by luck.
The proper explanation has been given in this thread. To be sure I am getting it now I'll try to summarize it:
Union queries are interpreted left to right. If "union all" is followed by "union", the "union all" will return duplicates, but these will be filtered by the following "union". Other way around, if "union" is followed by "union all", any duplicates from the first 2 statements are filtered, but new duplicates may be introduced by the following "union all".
Parenthesis can be used to override the left-to-right evaluation.
An illustration can be given by putting more rows in the test tables:
create table #t1 (col int not null);
create table #t2 (col int not null);
create table #t3 (col int not null);
insert #t1 (col) values(1), (1);
insert #t2 (col) values(2), (2);
insert #t3 (col) values(3), (3);
select col from #t1
UNION
select col from #t2
UNION ALL
select col from #t3;
select col from #t1
UNION ALL
select col from #t2
UNION
select col from #t3;
And now the results are:
col
-----------
1
2
3
3
(4 row(s) affected)
col
-----------
1
2
3
(3 row(s) affected)
Seems like I was the first to learn something from my own question :).
Thanks for all the feedback!
Viewing 15 posts - 1 through 15 (of 49 total)
You must be logged in to reply to this topic. Login to reply