March 21, 2013 at 5:22 pm
Hi,
Here is the SQL I used as an example:
--CREATE AND POPULATE TABLE AAA ------------------------------------------
CREATE TABLE [dbo].[AAA](
[Col1] [nchar](10) NOT NULL,
[RowNumber] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[AAA]
([Col1]
,[RowNumber])
VALUES
('a', 1),
('s', 2),
('d', 3),
('e', 4),
('f', 5),
('g', 6)
GO
--QUERY No 1 ----------------------------------------------------------------
;with BBB as
(
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, Col1
from AAA
order by RowNumber
)
select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;
--QUERY No 2 ----------------------------------------------------------------
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, AAA.Col1
into BBB
from AAA
order by RowNumber;
select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;
--Cleaning -------------------------------------------------------------------
--drop table BBB;
--drop table AAA;
I have a table AAA with just two columns, one of which is a row number. I use CTE to create another table “BBB”, which based on the AAA. The BBB table contains the same columns as the AAA table, plus a new column RandomRow.
The RandomRow is populated by random numbers, generated from the range 1 to @N, where @N is the number of rows in the table AAA – in this case 6. The table BBB contains 3 rows only: the row numbers will be 1,2,3.
The Select statement should return rows from AAA, which are identified by the number in the [BBB].[RandomRows].
The Select statement contains the join, which is expected to return exactly 3 rows, because the BBB is subset of AAA. However, each time I run the query the Select returns different numbers of rows.
To see the effect, please try to re-run the query No 1 several times, and you should get different numbers of returned rows each time.
If I replace the CTE with real table (see the query 2/) I get expected results, exactly 3 rows each time I run the query.
My questions are:
A/ Why the join yields different results for CTE and for the real table?
B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.
Thank you for help.
March 21, 2013 at 5:30 pm
itlk (3/21/2013)
A/ Why the join yields different results for CTE and for the real table?B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.
Thank you for help.
A) You're connecting Rownumber to randomRow, which changes EACH time you run the cte, as you generate a newID each run. You'll be very variable there. You need to connect rownumber to rownumber for consistency.
B) Right outer join is going to return you the three rows from BBB, sure, with varying connectivity.
It all revolves around you using a randomized value in the cte for your join.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 21, 2013 at 5:55 pm
Thanks for answer.
I am not concerning about values per se. Sure, the numbers are random, so values will be different each time I run the query.
My concern is, that I should get each time exactly 3 rows (I am not voried about the content of rows). With CTE I get less or more rows, each time I run the query.
With the real table, the same query returns exactly 3 rows.
Why is the difference between CTE and real table?
Thanks.
March 21, 2013 at 6:45 pm
Interesting. Looks like you hit a similar problem to this:
http://www.glorf.it/blog/2008/05/16/sql-talk/sql-server-is-not-aware-of-nondeterministic-functions
In query #1, the non-deterministic function in the subquery is applied after the join (also note the "No Join Predicate" warning in the join operator.
The non-deterministic subquery has already been materialised in query #2, therefore is guaranteed to be correct.
March 21, 2013 at 6:47 pm
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".
{Edit} Heh... beat out by 2 minutes. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
March 21, 2013 at 6:54 pm
EDIT: Please ignore this entire post. One dog, wrong tree, right forest.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 21, 2013 at 6:56 pm
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".{Edit} Heh... beat out by 2 minutes. 🙂
Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?
EDIT:
I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.
First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.
Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.
The predicate filter is... Oh god bless rubber ducky coding... of course.
Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.
Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 21, 2013 at 7:48 pm
Thanks for help.
March 22, 2013 at 4:43 am
It's down to an optimisation introduced in SQL Server2005 often called "Deferred expression evaluation". Here's a great article by Paul White to add to others already posted.
The original "failing" query can be modified so that the calculation is written as a row-level CROSS APPLY:
WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
CROSS APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;
The execution plan is identical to Query No 1:
Changing the CROSS APPLY to an OUTER APPLY
WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
OUTER APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;
is sufficient to prevent the optimiser from implementing deferred expression evaluation and it generates a completely different plan where RandomRow is evaluated prior to the join between the CTE and the table:
With RandomRow evaluated prior to the join, it's now available to use as a join predicate and the query returns the expected results.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 23, 2013 at 7:29 pm
Evil Kraig F (3/21/2013)
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".{Edit} Heh... beat out by 2 minutes. 🙂
Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?
EDIT:
I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.
First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.
Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.
The predicate filter is... Oh god bless rubber ducky coding... of course.
Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.
Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.
Correct. The only difference is that a lack of a predicate doesn't make a FULL OUTER JOIN. It makes a CROSS JOIN.
Heh... that's one of my favoritie interview questions, BTW. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply