"Order" between JOIN and APPLY

Question

"Order" between JOIN and APPLY

btio_3000

SSCrazy

Points: 2662
More actions
November 17, 2014 at 4:41 pm

#292192

I ran into a statement where it selects from something like that:
...
FROM
Table1 AS a
CROSS APPLY data.nodes('.../Node1') AS T1(F) -- part 1
LEFT JOIN Table1 AS b
CROSS APPLY data.nodes('.../Node2') AS T2(F) -- part 2
ON b.ID = a.ID
(both "a" and "b" refer to the same table Table1)
In a FROM like this, what is the "order" of joining?
Does it do [Table1 cross apply T1] first, [Table1 cross apply T2] second and finally left joins those two parts Part1 and Part2 together?
Or does it left joins Table1 to itself first and then goes with two cross applies?
Or... ?
Thanks!
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

sgmunson SSC Guru Points: 110640 More actions · Answer 1

I don't know the direct answer, but why place the ON clause for that LEFT JOIN in a position AFTER the CROSS APPLY unless you intended that CROSS APPLY to take place BEFORE the LEFT JOIN ? However, I must also state that writing a query that way seems to me to be an EXTREMELY BAD WAY TO WRITE A QUERY.

How about writing it this way instead?

...

FROM (

SELECT *

FROM Table1 AS a

CROSS APPLY data.nodes('.../Node1') AS T1F -- part 1

) AS X

LEFT JOIN (

SELECT *

FROM Table1 AS b

CROSS APPLY data.nodes('.../Node2') AS T2F -- part 2

) AS Y

ON X.ID = Y.ID

At least this way, the intent is clear. Also, using left or right parentheses as part of a table alias is probably a bad idea, even if it actually works.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Steve - (F) is a column name assignment:

SELECT * FROM (SELECT 1) d (ColumnName)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

sgmunson SSC Guru Points: 110640 More actions · Answer 3

ChrisM@Work (11/19/2014)
Steve - (F) is a column name assignment:
SELECT * FROM (SELECT 1) d (ColumnName)

Ahhh, ... ok... I try really hard NOT to use that kind of construction in a query, and just use the AS keyword and provide a column alias.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

sgmunson (11/19/2014)
ChrisM@Work (11/19/2014)
Steve - (F) is a column name assignment:
SELECT * FROM (SELECT 1) d (ColumnName)
Ahhh, ... ok... I try really hard NOT to use that kind of construction in a query, and just use the AS keyword and provide a column alias.

I quite agree - it's not intuitive and so it's not often used. The other way around however is quite common - here are the first few bars of everyone's favourite TSQL song:

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

btio_3000 SSCrazy Points: 2662 More actions · Answer 5

sgmunson (11/19/2014)
I don't know the direct answer, but why place the ON clause for that LEFT JOIN in a position AFTER the CROSS APPLY unless you intended that CROSS APPLY to take place BEFORE the LEFT JOIN ? However, I must also state that writing a query that way seems to me to be an EXTREMELY BAD WAY TO WRITE A QUERY.
How about writing it this way instead?
...
FROM (
SELECT *
FROM Table1 AS a
CROSS APPLY data.nodes('.../Node1') AS T1F -- part 1
) AS X
LEFT JOIN (
SELECT *
FROM Table1 AS b
CROSS APPLY data.nodes('.../Node2') AS T2F -- part 2
) AS Y
ON X.ID = Y.ID
At least this way, the intent is clear. Also, using left or right parentheses as part of a table alias is probably a bad idea, even if it actually works.

That's actually why I am asking:) It seems that it's what the existing code is trying to do, and if so then it could probably be replaced with something like:

SELECT...

FROM Table1 AS a

CROSS APPLY data.nodes('.../Node1') AS T1(F)

OUTER APPLY data.nodes('.../Node2') AS T2(F)

which is almost two times faster in my case.

But again, since it's written without using parentheses to separate parts "X" and "Y" as in your example... Seems like the answer to my question is "Yes", and order b/w JOIN and APPLY does matter here... Apply in this case always relates to the "immediate table before", right? Just thinking out loud here, never used APPLY/nodes before

Thanks!

sgmunson SSC Guru Points: 110640 More actions · Answer 6

btio_3000 (11/19/2014)
sgmunson (11/19/2014)
I don't know the direct answer, but why place the ON clause for that LEFT JOIN in a position AFTER the CROSS APPLY unless you intended that CROSS APPLY to take place BEFORE the LEFT JOIN ? However, I must also state that writing a query that way seems to me to be an EXTREMELY BAD WAY TO WRITE A QUERY.
How about writing it this way instead?
...
FROM (
SELECT *
FROM Table1 AS a
CROSS APPLY data.nodes('.../Node1') AS T1F -- part 1
) AS X
LEFT JOIN (
SELECT *
FROM Table1 AS b
CROSS APPLY data.nodes('.../Node2') AS T2F -- part 2
) AS Y
ON X.ID = Y.ID
At least this way, the intent is clear. Also, using left or right parentheses as part of a table alias is probably a bad idea, even if it actually works.
That's actually why I am asking:) It seems that it's what the existing code is trying to do, and if so then it could probably be replaced with something like:
SELECT...
FROM Table1 AS a
CROSS APPLY data.nodes('.../Node1') AS T1(F)
OUTER APPLY data.nodes('.../Node2') AS T2(F)
which is almost two times faster in my case.
But again, since it's written without using parentheses to separate parts "X" and "Y" as in your example... Seems like the answer to my question is "Yes", and order b/w JOIN and APPLY does matter here... Apply in this case always relates to the "immediate table before", right? Just thinking out loud here, never used APPLY/nodes before
Thanks!

I noticed that your newest example uses OUTER APPLY instead of CROSS APPLY for the 2nd of the APPLY joins. Without having some idea of what each of the data.nodes() tables provides, it would be impossible to judge the reason behind the speedup, save the obvious of having to do two cross applies before doing a LEFT JOIN between the results. Take a look at the execution plans on both and you may see why.

My curiosity is piqued by the presence of those data.nodes elements. Pray, do tell, as to the nature of the content and the reasoning behind the use of the CROSS APPLY might well show itself. Let's find out what the real purpose of this query is and maybe the true objective will show itself...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

btio_3000 SSCrazy Points: 2662 More actions · Answer 7

This is how original code looks like:

SELECT

CASE WHEN F1.value('...') IS NOT NULL THEN F1.value('...')

ELSE F2.value('...') END

FROM

-- part 1

Table1 AS a

CROSS APPLY data.nodes('.../Node1') AS T1(F1)

-- part 2

LEFT JOIN Table1 AS b

CROSS APPLY data.nodes('.../Node2') AS T2(F2)

ON b.ID = a.ID

I believe what it's trying to do is to select rows with existing Node1's first, and then and for those only get their Node2's, if exist, or NULL otherwise

That's actually why I was asking that question about "order" 🙂 - just trying to see what it's doing.

If I am right, that exact same thing can be done by simply replacing "part1" with outer apply as below:

SELECT

CASE WHEN F1.value('...') IS NOT NULL THEN F1.value('...')

ELSE F2.value('...') END

FROM

Table1 AS a

CROSS APPLY data.nodes('.../Node1') AS T1(F1)

OUTER APPLY data.nodes('.../Node2') AS T2(F2)

Cannot find any other use for having Table1 twice...

So... Back to "ordering"... Here, it's:

1) "Take Table1, cross apply Node1" first, and then

2) "Take (1) and outer apply Node2"

is it so?

Originally though, the "ordering" seemed to be

1) "Take Table1, cross apply Node1"

2) "Take Table1, cross apply Node2" and

3) "Take (1) and left join it with (2)

again, is it so?

I apologize if I sound confusing and thanks as always

sgmunson SSC Guru Points: 110640 More actions · Answer 8

That post doesn't really tell me anything you hadn't either stated or implied before, but you also introduced new questions without answering any of the ones I asked. I want to know what those data.nodes tables are, or are they table-valued functions? How about some some sample data for them, or some actual data with the code for the function and the expected result? The new questions raised are that in the new code, you specify F1.value, when it appears that F1 is actually a field name... Code written that isn't going to actually work on someone elses instance because it has dependencies on your installation has limits, and in many cases it can be very hard to do anything but speculate. You've not said word one about what kind of data this is, nor the overall objective, and if you want help with what it's going to do, you're going to have to provide some level of detail.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

serg-52 SSCrazy Eights Points: 9913 More actions · Answer 9

Looks like JOIN .. ON introduces parenthesis

SELECT

CASE WHEN F1.value('...') IS NOT NULL THEN F1.value('...')

ELSE F2.value('...') END

FROM

Table1 AS a

CROSS APPLY data.nodes('.../Node1') AS T1(F1)

LEFT JOIN ( Table1 AS b

CROSS APPLY data.nodes('.../Node2') AS T2(F2) )

ON b.ID = a.ID

so the second CROSS APPLY must precede JOIN. As a consequense, a and T1 aren't defined within that parenthesis.

The rest order of the execution depends on optimizer.

btio_3000 SSCrazy Points: 2662 More actions · Answer 10

"data" here is an xml field of Table 1, sorry I did not mentioned it initially

Thanks!

sgmunson SSC Guru Points: 110640 More actions · Answer 11

btio_3000 (11/20/2014)
"data" here is an xml field of Table 1, sorry I did not mentioned it initially
Thanks!

Okay, but that hardly begins to answer the questions I posed. If you really want help, then you're going to have to disclose more than that... Make up the data if you still need to obfuscate, but at least state the objective in sufficient detail. We really don't have enough to go on here...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

btio_3000 SSCrazy Points: 2662 More actions · Answer 12

Thanks again for following it, and sorry if I'm still not making sense:-)

Let me try to start from the beginning:

- There is a table, Table1 in my example, that has some fields including it's PK (ID in my example) and an xml field ("data" in my example)

- That xml field, data, has a bunch of nodes, including (.../Node1) and (.../Node2)

- The idea is, I believe (again, it's not my code), to select from Table1 only the records where Node1 exists in their "data",

and then for those return either their Node2 value, if exist, or NULL otherwise

I believe this is what this code does:

SELECT ... FROM

Table1 AS a

CROSS APPLY data.nodes('.../Node1')

LEFT JOIN Table1 AS b

CROSS APPLY data.nodes('.../Node2')

ON a.id = b.id

so what I am trying to make sure of is that in this code is the logical order always as if it was written like you pointed in one of your replies:

SELECT ... FROM

(SELECT * FROM Table1

CROSS APPLY data.nodes('.../Node1')

) AS X

LEFT JOIN

(SELECT * FROM Table1

CROSS APPLY data.nodes('.../Node2')

) AS Y

ON X.id = Y.id

?

In other words, what if I had two different tables, Table1 and Table2, and the code is like below:

SELECT ... FROM

Table1 AS a

CROSS APPLY data.nodes('.../Node1') -- 1st cross apply

LEFT JOIN Table2 AS b

CROSS APPLY data.nodes('.../Node2') -- 2nd cross apply

ON a.id = b.id

would it always "know" to first take Table1 and cross-apply it for Node1 (X),

then take Table2 and cross-apply it for Node2 (Y),

and finally do X LEFT JOIN Y

?

Thanks!

mister.magoo SSC-Forever Points: 47174 More actions · Answer 13

It is quite clear from the syntax definition that your intuition about bracketing is correct.

http://msdn.microsoft.com/en-us/library/ms177634.aspx
[ FROM { <table_source> } [ ,...n ] ]
< table_source > ::=
{
... various options omitted for clarity
| < joined_table >
... various options omitted for clarity
}
< joined_table > ::=
{
< table_source > < join_type > < table_source > ON < search_condition >
| left_table_source { CROSS | OUTER } APPLY right_table_source
... various options omitted for clarity
}

...which shows that you can join a "table_source" to a "table_source", but any "table_source" can itself be either a "table_source" or indeed a "joined_table", and a "joined_table" can be a "table_source" joined to a "table_source" or CROSS APPLY to a "table_source".

Clear as you like :w00t:

But, if you "get" these descriptions, I believe that it is defined therein that

TABLE1 JOIN TABLE2 CROSS APPLY X.NODES ON TABLE1.col1 = TABLE2.col1

is actually defined by the syntax rules as

"table_source1" JOIN "table_source2" CROSS APPLY "table_source3" ON "table_source1".col1 = "table_source2"col1

and that

"table_source2" CROSS APPLY "table_source3"

is defined as a "joined_table", which leaves you with

"table_source1" JOIN "joined_table1" ON "table_source1"col1 = "joined_table1"col1

No other syntax rule fits that combination of clauses that I can see.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

sgmunson SSC Guru Points: 110640 More actions · Answer 14

I'm pretty sure Mr. Magoo has your answer, but the bigger and maybe more important question, is why leave code hanging around that is formatted that way, when there are so many better options? Surely, it wouldn't be difficult to determine if you have the equivalent query through testing. And what of this query's overall objective? If you know what it's supposed to do, then you should also know whether the equivalent will give you the right results. Ultimately, I don't know that having an answer to your original question actually does you any favors. Just my two cents...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)