Difference between Full Outer and Cross Join

  • Hi Friends,

    I had an interview and i was asked the below scenario..

    I gave him the answer with Cross Join and he told that it could be done by Full Outer. So why are you going to Cross Join and what's the difference between Full Outer join and cross join and when should we go for it?

    I gave him a few differences but he was not satisfied. So here I am posting this question out of curiosity. Friends, can you please advise me ?

    declare @test-2 table (country varchar(10))

    insert into @test-2

    select 'india'

    union

    select 'australia'

    union

    select 'america'

    union

    select 'new zeala'

    select t1.country + ' vs ' + t2.country from @test-2 t1

    full outer join @test-2 t2 on t1.country <> t2.country --where t1.country = 'india' --and t2.country <> 'india'

    select t1.country + ' vs ' + t2.country from @test-2 t1

    cross join @test-2 t2 --on t1.country = t2.country

    where t1.country <> t2.country

    Thanks,
    Charmer

  • Hi Charmer,

    A CROSS JOIN, also known as a Cartesian product, is a join type that has no ON clause. It instructs SQL Server to combine every row from the first source with every row from the second source. In your case both sources have the same four rows (because it's the same table twice), so this results in all 16 possible combinations. The WHERE clause then (**) filters out the four combinations of a country in itself, leaving you with the full competition schema.

    (**) The "next" here refers to the logical processing order; the actual processing order chosen by the Query Optimizer can be different as long as the results are the same.

    A FULL OUTER JOIN (and its counterparts LEFT and RIGHT OUTER JOIN) are extensions of the INNER JOIN. An INNER JOIN will combine each row from the first source with every row from the second source for which the ON clause is true. Rows from the first source for which no such match is found are omitted, as are rows from the second source that do not match any row from the first. The last part is where FULL OUTER JOIN changes its behaviour: rows from the first or second source that have no match at all will still appear in the result set, with all columns that come from the other source represented as NULLs. (For LEFT and RIGHT OUTER JOIN, that changed behaviour applies to only the first or only the second source).

    In your case, the ON clause tests inequality. Every row from the first source will have three matches, and every row from the second source is matched thrice. So you could even change the FULL OUTER JOIN to an INNER JOIN and still get the same results. With the given data, the results you get are the same but I feel that the query does not represent the logic of the problem as clearly as CROSS JOIN.

    Also, in one specific case the query with FULL OUTER JOIN produces incorrect results. If there is just a single country in the table, then the CROSS JOIN will produce no results at all (correct), but the FULL OUTER JOIN will produce two rows. One for the row from the first table, matched to nothing from the second, and one for the same country on the second table, with no matches on the first. The string concatenation will make the result NULL in both cases, but if you change the SELECT to just list the two country names, you will see that the rows returned are (India, NULL) and (NULL, India).

    The INNER JOIN alternative does not have this problem. It is semantically completely equivalent to the CROSS JOIN, and will always return the same results. Here the difference is merely a matter of personal preference. For understandability and maintainability, I would favor the CROSS JOIN.

    Finally, when you look at performance you should also discard the FULL OUTER JOIN option, because the CROSS JOIN (or the equivalent INNER JOIN) are faster.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi

    Just to add to Hugo's comment.

    You could also look at the statistics to try and help understand the difference between queries (given they actually do give the same results).

    I changed the table variable into a real table (renamed to country) and populated it as you had done and then used the following to get the statistics on just the queries.

    set statistics io on

    set statistics time on

    I ran each query separately.

    The first query gave the following results

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 3 ms.

    (12 row(s) affected)

    Table 'Worktable'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'country'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    The second query gave these results

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 2 ms.

    (12 row(s) affected)

    Table 'country'. Scan count 2, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    The first query creates a worktable which the second query does not. This is additional work. It also looks like it scanned the worktable once and did 19 logical reads. It also scanned the actual table twice and made 2 logical reads. That's a total of 21 pages read from the data cache.

    The second query does not use a worktable. It scans the actual table twice (like the first query) and makes just 5 logical reads. That's just 5 pages read from the data cache which is 16 pages less than the first.

    I also had a look at the execution plan and the total subtree costs for query 2 was just 0.006895 compared with 0.0136612 for the first query.

    I'd say that, in this instance, the second wins on performance (just as Hugo said).

    I hope this helps.

    Rich

  • richieStick (3/28/2016)


    I also had a look at the execution plan and the total subtree costs for query 2 was just 0.006895 compared with 0.0136612 for the first query

    ... but beware that those costs are estimates, not actual costs. Estimates can be wrong.

    (Other than that, I fully agree with everything you posted)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (3/28/2016)


    (**) The "next" here refers to the logical processing order; the actual processing order chosen by the Query Optimizer can be different as long as the results are the same.

    I actually prefer the terms "logical processing order" and "physical processing order", because it can reasonably be argued that the logical processing order is the "actual" processing order, because it determines the correctness of the results.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply