Stairway to Advanced T-SQL Level 1: Intro to Advanced T-SQL Using a CROSS JOIN

  • Oops, cut and pasted the wrong text.

    Gregory A. Larsen, MVP

  • Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

  • So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?

    That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! 🙂

  • Is it possible to disable cross joins for either a specific login, database, or server (or combination) for production servers? I find cross joins to be a security risk as it is a playground for DoS attacks. Not to mention, programmers that don't know what they are doing.

  • So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?

    That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! Smile

    In essence, you are correct. However, CROSS JOIN is also the ANSI SQL standard.

    --Edit: got hit by the reply/quote bug

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • kkilsby (2/19/2016)


    Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

    No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

    For one thing, that would be grossly inefficient; for a second, it would require some tweaking for certain types of joins.

    Think OUTER JOINS, particularly. Take the following simple example:

    left_id

    -----------

    1

    2

    right_id

    -----------

    2

    3

    CROSS JOIN RESULTS

    left_id right_id

    ----------- -----------

    1 2

    2 2

    1 3

    2 3

    LEFT OUTER JOIN ON left_id=right_id RESULTS

    left_id right_id

    ----------- -----------

    1 NULL --This row appears nowhere in the CROSS JOIN results

    2 2

    That first row in the result set of the left outer join couldn't be achieved by taking the result of the cross join and filtering (at least not simply; you'd have to introduce other mechanisms into the join algorithm).

    As for the inefficiency/performance problem, try the following example:

    --Run the CREATE TABLE and INSERT statements as

    --separately from the SELECT queries that follow.

    --One of the SELECTs (hint: it's the CROSS JOIN)

    --will not finish in a reasonable time.

    CREATE TABLE #test_all_int (ID int);

    CREATE TABLE #test_odd_int (ID int);

    INSERT INTO #test_all_int

    SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT 1))

    FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2;

    INSERT INTO #test_odd_int

    SELECT ID

    FROM #test_all_int ai

    WHERE ai.ID%2=1;

    GO

    --In the following queries I declare a variable to

    --which I can redirect the output of the query.

    --That way we don't have to deal with the overhead

    --of displaying the results.

    --Here's the CROSS JOIN:

    --Don't hold your breath waiting for this to finish;

    --run it and wait as long as your patience

    --allows. :)

    DECLARE @int_bucket int;

    SELECT @int_bucket=ai.id, @int_bucket=oi.id

    FROM #test_all_int ai

    CROSS JOIN

    #test_odd_int oi;

    GO

    --If this INNER JOIN were implemented by doing the cross join

    --and then filtering, it should take at least as long

    --as the above query. This one actually finishes fairly

    --quickly, though :)

    DECLARE @int_bucket int;

    SELECT @int_bucket=ai.id, @int_bucket=oi.id

    FROM #test_all_int ai

    INNER JOIN

    #test_odd_int oi ON ai.id=oi.id;

    GO

    --Clean up after ourselves

    DROP TABLE #test_all_int,#test_odd_int;

    GO

    Craig Freedman has a nice set of articles looking at the physical join operators used in SQL Server; they should clear up how SQL Server generally goes about implementing JOINs:

    http://blogs.msdn.com/b/craigfr/archive/2006/07/26/nested-loops-join.aspx

    http://blogs.msdn.com/b/craigfr/archive/2006/08/03/merge-join.aspx

    http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx

    Cheers!

  • Jacob Wilkins (2/19/2016)


    kkilsby (2/19/2016)


    Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

    No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

    Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:

    1. a Cartesian Product is created

    2. The ON Filter is evaluated and unmatched rows are filtered out

    3. The OUTER rows are added when applicable

    When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)

    Itzik Ben Gan Logical Query Processing Poster

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (2/22/2016)


    Jacob Wilkins (2/19/2016)


    kkilsby (2/19/2016)


    Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

    No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

    Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:

    1. a Cartesian Product is created

    2. The ON Filter is evaluated and unmatched rows are filtered out

    3. The OUTER rows are added when applicable

    When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)

    Itzik Ben Gan Logical Query Processing Poster

    Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).

    Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.

    I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂

    Cheers!

  • Jacob Wilkins (2/22/2016)


    Alan.B (2/22/2016)


    Jacob Wilkins (2/19/2016)


    kkilsby (2/19/2016)


    Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

    No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

    Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:

    1. a Cartesian Product is created

    2. The ON Filter is evaluated and unmatched rows are filtered out

    3. The OUTER rows are added when applicable

    When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)

    Itzik Ben Gan Logical Query Processing Poster

    Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).

    Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.

    I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂

    Cheers!

    I agree with everything you are saying. I could have re-worded my response to distinguish between what is happening logically and what is actually happening.

    Cheers.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Great to see this explained. Thanks.

  • Trying to learn SQL by working throught this article, and Listing 7 ran for over 1:24:50 and returned error about widows handle …

    Dropped WHILE @I < 1000 to WHILE @I < 10

    And completed in 2:35 - with no records

    Why so long? / issues

    please be gentle ....

  • Instead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
    or even: select columns from table1, table2
    Is there any difference in the output between these three, either logically or in performance?
    While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.

    http://90.212.51.111 domain

  • neil-560592 - Tuesday, December 18, 2018 6:46 AM

    Instead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
    or even: select columns from table1, table2
    Is there any difference in the output between these three, either logically or in performance?
    While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.

    IMHO opinion, neither of those methods has any advantage over CROSS JOIN and vice versa except for one thing...

    If people are doing the 1=1 thing or the non-ANSI joins that you suggest, then they already know about the CROSS JOIN thing even if they've not used the "CROSS JOIN".  If people are going to use SQL , then they should learn the correct JOIN syntax and use it.  The 1=1 and non-ANSI JOIN methods are considered to be "code smells" in SQL and should be avoided in most cases... especially if you're trying to impress someone on an iinterview or impress an "inspection team" from one of your customers.  It also makes the use of cross join functionality very obvious when troubleshooting underfire during a crisis and helps make it totally obvious that it's not a mistake just because of some missing criteria.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good article Gregory

    Listing 7: TSQL to create sample data for performance test

    IF @I%7 > 0

    This code adds no sales data for every 7th customer ?

     

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply