October 9, 2013 at 9:29 am
I guess I am asking “Why doesn’t a virtual table in a FROM clause behave like other subqueries in SELECT or WHERE that are able to make use of the aliases in the FROM clause?”
Is it because the aliases in the outer FROM haven’t been created yet because the virtual table has to be created first?
Forgive me for not posting sample data. I was hoping this example would speak for itself. It fails with "multi-part identifier "ce.id" could not be bound". Also my formatting is being mangled a bit by quote paste.
SELECT ce.id as ClientEnrollmentKey, max(cs.ID) as MaxClientStatusKey
FROM ClientEnrollment ce
INNER JOIN ClientPacket cp on ce.ID = cp.ClientEnrollmentKey
INNER JOIN ClientStatus cs on cp.ID = cs.PacketKey
INNER JOIN
(SELECT max(cp.Packet_Date) as MaxClientStatusDate
FROM ClientPacket cp
INNER JOIN ClientStatus cs on cp.ID = cs.PacketKey
where cp.ClientEnrollmentKey = ce.id) t1 -- ce from outer from is invisible
on cp.Packet_date = t1.MaxClientStatusDate
group by ce.id
October 9, 2013 at 9:48 am
That's because JOINs don't support correlated subqueries. Change the INNER JOIN for a CROSS APPLY. 😉
October 9, 2013 at 9:50 am
First, minor terminology sticking point: I think you mean DERIVED table.
And second, you need to think of a derived table as a query unto itself. Think of it like a view you are defining on the fly. It has no idea about anything outside its own scope.
Try this
SELECT ce.id as ClientEnrollmentKey, max(cs.ID) as MaxClientStatusKey
FROM ClientEnrollment ce
INNER JOIN ClientPacket cp on ce.ID = cp.ClientEnrollmentKey
INNER JOIN ClientStatus cs on cp.ID = cs.PacketKey
INNER JOIN
(SELECT cp.ClientEnrollmentKey,max(cp.Packet_Date) as MaxClientStatusDate
FROM ClientPacket cp
INNER JOIN ClientStatus cs on cp.ID = cs.PacketKey
) t1 -- ce from outer from is invisible
on cp.Packet_date = t1.MaxClientStatusDate
AND ce.id = t1.ClientEnrollmentKey
group by ce.id
October 9, 2013 at 9:54 am
Thanks for the responses and they both tell me “what”. I am after the “why”.
October 9, 2013 at 9:57 am
jshahan (10/9/2013)
Thanks for the responses and they both tell me “what”. I am after the “why”.
“Why doesn’t a virtual table in a FROM clause behave like other subqueries in SELECT or WHERE that are able to make use of the aliases in the FROM clause?”
Because it isn't a subquery
Think of a subquery as one that runs once for each row being returned (or evaluated in a where clause)
Think of a derived table as being evaluated once overall, and then having its results JOINED to the rest.
October 9, 2013 at 10:01 am
Thanks, Nevyn. I'm still inclined to think of any select within a select as a subquery but let me take that idea and run with it.
October 9, 2013 at 10:02 am
jshahan (10/9/2013)
Thanks for the responses and they both tell me “what”. I am after the “why”.
Let me quote myself.
because JOINs don't support correlated subqueries.
And to quote Nevyn
And second, you need to think of a derived table as a query unto itself. Think of it like a view you are defining on the fly. It has no idea about anything outside its own scope.
We tell you why and how to correct it.
October 10, 2013 at 4:41 am
This is to do with how a query is executed. I'll try and explain but I've never tried to put this into words so forgive me if it's odd.
Logically, the following is the processing order of a SELECT statement (see http://msdn.microsoft.com/en-us/library/ms189499.aspx): -
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
11.TOP
We could make it more complicated than that, but in this case it's unnecessary.
Let's create some sample data to play with and help with the explanation.
--Standard TestEnvironment of 1,000,000 rows of random-ish data
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN;
DROP TABLE #testEnvironment;
END;
--1,000,000 Random rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
RAND(CHECKSUM(NEWID())) * 30000 /*(Number of days in range)*/ + CAST('1945' AS DATETIME) /*(Start date, e.g. '1945-01-01 00:00:00*/ AS randomDateTime,
DATEADD(DAY,((ABS(CHECKSUM(NEWID())) % 366 /*(Number of days in range)*/) + 1),CAST('1945' AS DATE) /*(Start date, e.g. '1945-01-01*/) AS randomDate,
ABS(CHECKSUM(NEWID())) AS randomBigInt,
(ABS(CHECKSUM(NEWID())) % 100) + 1 AS randomSmallInt,
RAND(CHECKSUM(NEWID())) * 100 AS randomSmallDec,
RAND(CHECKSUM(NEWID())) AS randomTinyDec,
RAND(CHECKSUM(NEWID())) * 100000 AS randomBigDec,
CONVERT(VARCHAR(6),CONVERT(MONEY,RAND(CHECKSUM(NEWID())) * 100),0) AS randomMoney
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;
So, we have 1 million rows of pseudo-random data. Let's make a subquery.
SELECT *
FROM (SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat,
randomSmallInt
FROM #testEnvironment) a;
This query executes in this order: -
1. FROM #testEnvironment
2. SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat, randomSmallInt
3. FROM SUBQUERY (steps 1 and 2)
4. SELECT ID, Cat, randomSmallInt
If we want to add a WHERE clause outside of the subquery that references a column in the SELECT of the subquery, you can see from the logical query processing order that it'll work as WHERE is step 4.
SELECT *
FROM (SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat,
randomSmallInt
FROM #testEnvironment) a
WHERE Cat = 'A';
The query now executes in this order: -
1. FROM #testEnvironment
2. SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat, randomSmallInt
3. FROM SUBQUERY (steps 1 and 2)
4. WHERE Cat = A
5. SELECT ID, Cat, randomSmallInt
Now we'll create a second temporary table to play with.
IF object_id('tempdb..#testEnvironment2') IS NOT NULL
BEGIN;
DROP TABLE #testEnvironment2;
END;
--2 rows of data
SELECT Cat,Pos
INTO #testEnvironment2
FROM (VALUES('A',20),('B',70))a(Cat,Pos);
Let's try and do something similar to what you were doing.
SELECT a.ID, a.Cat, c.[State]
FROM (SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat,
randomSmallInt
FROM #testEnvironment) a
INNER JOIN (SELECT CASE WHEN a.randomSmallInt > b.Pos THEN 'BAD' ELSE 'GOOD' END AS [State],
b.Pos, b.Cat
FROM #testEnvironment2 b
) c ON a.Cat = c.Cat;
When SQL Server comes to execute this, it will do so in the following order: -
1. FROM #testEnvironment
2. SELECT ID, CASE WHEN randomSmallInt < 50 THEN 'A' ELSE 'B' END AS Cat, randomSmallInt
3. FROM #testEnvironment2
4. SELECT CASE WHEN a.randomSmallInt > b.Pos THEN 'BAD' ELSE 'GOOD' END AS [State], b.Pos, b.Cat
5. FROM SUBQUERY (steps 1 and 2)
6. ON a.Cat = c.Cat
7. JOIN
8. SELECT a.ID, a.Cat, c.[State]
Note that at step 4 when it is trying to SELECT a.randomSmallInt, it has not yet found anything about the first query. This means that the alias "a" doesn't exist as far as this section of the query is concerned, so it will fail to bind.
October 10, 2013 at 7:21 am
Is it because the aliases in the outer FROM haven’t been created yet because the virtual table has to be created first?
Thanks, Cadavre, for the very comprehensive response. If I read it correctly, I believe it confirms my initial suspicion about the sequence of execution being responsible for the binding issue.
Next time I'll use your methodology.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply