January 12, 2016 at 1:10 pm
Oops, cut and pasted the wrong text.
Gregory A. Larsen, MVP
February 19, 2016 at 5:44 am
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.
February 19, 2016 at 8:16 am
So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?
That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! 🙂
February 19, 2016 at 9:03 am
Is it possible to disable cross joins for either a specific login, database, or server (or combination) for production servers? I find cross joins to be a security risk as it is a playground for DoS attacks. Not to mention, programmers that don't know what they are doing.
February 19, 2016 at 9:10 am
So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?
That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! Smile
In essence, you are correct. However, CROSS JOIN is also the ANSI SQL standard.
--Edit: got hit by the reply/quote bug
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
February 19, 2016 at 9:48 am
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.
No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
For one thing, that would be grossly inefficient; for a second, it would require some tweaking for certain types of joins.
Think OUTER JOINS, particularly. Take the following simple example:
left_id
-----------
1
2
right_id
-----------
2
3
CROSS JOIN RESULTS
left_id right_id
----------- -----------
1 2
2 2
1 3
2 3
LEFT OUTER JOIN ON left_id=right_id RESULTS
left_id right_id
----------- -----------
1 NULL --This row appears nowhere in the CROSS JOIN results
2 2
That first row in the result set of the left outer join couldn't be achieved by taking the result of the cross join and filtering (at least not simply; you'd have to introduce other mechanisms into the join algorithm).
As for the inefficiency/performance problem, try the following example:
--Run the CREATE TABLE and INSERT statements as
--separately from the SELECT queries that follow.
--One of the SELECTs (hint: it's the CROSS JOIN)
--will not finish in a reasonable time.
CREATE TABLE #test_all_int (ID int);
CREATE TABLE #test_odd_int (ID int);
INSERT INTO #test_all_int
SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2;
INSERT INTO #test_odd_int
SELECT ID
FROM #test_all_int ai
WHERE ai.ID%2=1;
GO
--In the following queries I declare a variable to
--which I can redirect the output of the query.
--That way we don't have to deal with the overhead
--of displaying the results.
--Here's the CROSS JOIN:
--Don't hold your breath waiting for this to finish;
--run it and wait as long as your patience
--allows. :)
DECLARE @int_bucket int;
SELECT @int_bucket=ai.id, @int_bucket=oi.id
FROM #test_all_int ai
CROSS JOIN
#test_odd_int oi;
GO
--If this INNER JOIN were implemented by doing the cross join
--and then filtering, it should take at least as long
--as the above query. This one actually finishes fairly
--quickly, though :)
DECLARE @int_bucket int;
SELECT @int_bucket=ai.id, @int_bucket=oi.id
FROM #test_all_int ai
INNER JOIN
#test_odd_int oi ON ai.id=oi.id;
GO
--Clean up after ourselves
DROP TABLE #test_all_int,#test_odd_int;
GO
Craig Freedman has a nice set of articles looking at the physical join operators used in SQL Server; they should clear up how SQL Server generally goes about implementing JOINs:
http://blogs.msdn.com/b/craigfr/archive/2006/07/26/nested-loops-join.aspx
http://blogs.msdn.com/b/craigfr/archive/2006/08/03/merge-join.aspx
http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx
Cheers!
February 22, 2016 at 8:21 am
Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:
1. a Cartesian Product is created
2. The ON Filter is evaluated and unmatched rows are filtered out
3. The OUTER rows are added when applicable
When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)
Itzik Ben Gan Logical Query Processing Poster
-- Itzik Ben-Gan 2001
February 22, 2016 at 8:31 am
Alan.B (2/22/2016)
Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:
1. a Cartesian Product is created
2. The ON Filter is evaluated and unmatched rows are filtered out
3. The OUTER rows are added when applicable
When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)
Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).
Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.
I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂
Cheers!
February 22, 2016 at 8:47 am
Jacob Wilkins (2/22/2016)
Alan.B (2/22/2016)
Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:
1. a Cartesian Product is created
2. The ON Filter is evaluated and unmatched rows are filtered out
3. The OUTER rows are added when applicable
When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)
Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).
Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.
I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂
Cheers!
I agree with everything you are saying. I could have re-worded my response to distinguish between what is happening logically and what is actually happening.
Cheers.
-- Itzik Ben-Gan 2001
March 9, 2016 at 4:34 pm
Great to see this explained. Thanks.
December 22, 2016 at 2:26 pm
Trying to learn SQL by working throught this article, and Listing 7 ran for over 1:24:50 and returned error about widows handle …
Dropped WHILE @I < 1000 to WHILE @I < 10
And completed in 2:35 - with no records
Why so long? / issues
please be gentle ....
December 18, 2018 at 6:46 am
Instead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
or even: select columns from table1, table2
Is there any difference in the output between these three, either logically or in performance?
While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.
http://90.212.51.111 domain
December 18, 2018 at 7:41 am
neil-560592 - Tuesday, December 18, 2018 6:46 AMInstead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
or even: select columns from table1, table2
Is there any difference in the output between these three, either logically or in performance?
While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.
IMHO opinion, neither of those methods has any advantage over CROSS JOIN and vice versa except for one thing...
If people are doing the 1=1 thing or the non-ANSI joins that you suggest, then they already know about the CROSS JOIN thing even if they've not used the "CROSS JOIN". If people are going to use SQL , then they should learn the correct JOIN syntax and use it. The 1=1 and non-ANSI JOIN methods are considered to be "code smells" in SQL and should be avoided in most cases... especially if you're trying to impress someone on an iinterview or impress an "inspection team" from one of your customers. It also makes the use of cross join functionality very obvious when troubleshooting underfire during a crisis and helps make it totally obvious that it's not a mistake just because of some missing criteria.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 13, 2024 at 11:56 am
Good article Gregory
Listing 7: TSQL to create sample data for performance test
IF @I%7 > 0
This code adds no sales data for every 7th customer ?
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply