Really Simple - "Join" clarification

  • Hello,

    Please can you help?

    I've tied myself up in knots recently trying to understand joins (specifically to more than 2 tables) and I think I need some clarification before I can go any further.

    I've read in so many places over the web that a "join is a relationship between two and only two tables". Is this correct?

    In this example, for instance, it's clear that the "relationship" is between Table1 & Table2.

    Select * from Table1 inner join Table2 on .....

    But what about this?

    Select * from Table1 inner join Table2 on..... inner join Table3 on ......

    Does the bolded text effectively become one "table" so the relationship forms between this and Table3?

    Given all of that, does the ordering of the joins in the SQL statement make a difference? Or does the Query Optimiser some-how sort it all out to generate the same answer?

    I really appreciate any help given and look forward to any replies with anticipation.

    Regards,

    Pete

  • No, the relationship will be between table3 and whatever table(s) you put after the ON clause.

  • I've read in so many places over the web that a "join is a relationship between two and only two tables". Is this correct?

    Nope. Incorrect. A join can be a relationship between any number of tables the database engine can handle. I think the limit for SQL Server is something like 32,000 tables per query. Whatever it is, it's a HUGE number, and you'll probably never have to worry about it.

    The most complex join I ever wrote involved 12 tables, two derived tables ("tables" from other queries), and a Cross Apply to a string function. (Trained professional on a closed track. Don't try this at home. Playing on or around may result in terminal cross-eyes.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Actually, this is like Wave/Particle duality. In some situations, it helps to think of the joins working on multiple tables, but in other situations it helps to think of the joins as working on just two tables. Both are equally valid ways of looking at the same thing. In fact, according it Itzik Ben-Gan's book <http://www.sql.co.il/books/insidetsql2005>, each join does indeed act on only two tables. The first join is evaluated to create a virtual table, which is then used to join to the next table to create a second virtual table, and so on until all of the joins have been processed.

    The big issue to keep in mind is outer joins can create NULL values that can affect subsequent joins. Evaluating each join as operating on only two tables will help to diagnose these problems.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • My issue with defining it as "two and only two" tables, is that it moves SQL away from its declarative model and into a procedural model. Tell the computer what you want it to do, don't worry too much about the guy behind the curtain pulling the levers and switches.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • My issue with defining it as multiple tables is that you lose the fact that the order is important when you start introducing outer joins. Consider the following example:

    WITH Individuals AS (

    SELECT 1 AS Ind_ID

    , 'Rhys' AS First_Name

    )

    , Addresses AS (

    SELECT 1 AS Addr_ID

    , 0 AS Ind_ID

    , '1xx Main Street' AS Addr1

    , 'PA' AS State_Code

    )

    , States AS (

    SELECT 'PA' AS State_Code

    , 'Pennslyvania' AS State_Name

    )

    The following two queries produce completely different results, because the order is important.

    SELECT *

    FROM Individuals AS i

    LEFT OUTER JOIN Addresses AS a

    ON i.Ind_ID = a.Ind_ID

    INNER JOIN States AS s

    ON a.State_Code = s.State_Code

    SELECT *

    FROM Individuals AS i

    LEFT OUTER JOIN (

    Addresses AS a

    INNER JOIN States AS s

    ON a.State_Code = s.State_Code

    )

    ON i.Ind_ID = a.Ind_ID

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • "join is a relationship between two and only two tables"

    That's an interesting idea. I guess it depends on one's point of view. Let's consider a more specific example, namely the SQL below, which is valid:

    SELECT * FROM tableA a

    INNER JOIN tableB b ON a.keycol = b.keycol

    INNER JOIN tableC c ON c.col1 = a.col1 AND c.col2 = b.col2

    Is tableC joining two other tables, tableA and tableB?

    Or is tableC joining one other table, the "virtual" table created by the (previous) join of tableA and tableB?

    Scott Pletcher, SQL Server MVP 2008-2010

  • GSquared (10/26/2010)


    My issue with defining it as "two and only two" tables, is that it moves SQL away from its declarative model and into a procedural model. Tell the computer what you want it to do, don't worry too much about the guy behind the curtain pulling the levers and switches.

    When and until you end up with some screwy optimization pattern that defies known physics, and then you and the guy pulling the levers need to have a long discussion... Something I'm working on putting together a sample model for now for another thread's discussion.

    As mentioned, joins are between two tables at the machine leve, virtual or physical. However, the join pattern ends up as a flow, from one to the next to the next, so that you end up from your perspective as one solid result. I agree with GSquared that at the beginning, work with the declarative model and concepts before you start looking deeper into the mechanics.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thank you all for your replies.

    This discussion has definately improved my understanding of the subject.

    I find it fascinating when the experts get involved and contribute to my threads - I cant say I understand it all but I can more-or-less follow the themes. This in itself is a good learning opportunity.

    Thank you again for you help.

    Pete

  • drew.allen (10/26/2010)


    My issue with defining it as multiple tables is that you lose the fact that the order is important when you start introducing outer joins. Consider the following example:

    WITH Individuals AS (

    SELECT 1 AS Ind_ID

    , 'Rhys' AS First_Name

    )

    , Addresses AS (

    SELECT 1 AS Addr_ID

    , 0 AS Ind_ID

    , '1xx Main Street' AS Addr1

    , 'PA' AS State_Code

    )

    , States AS (

    SELECT 'PA' AS State_Code

    , 'Pennslyvania' AS State_Name

    )

    The following two queries produce completely different results, because the order is important.

    SELECT *

    FROM Individuals AS i

    LEFT OUTER JOIN Addresses AS a

    ON i.Ind_ID = a.Ind_ID

    INNER JOIN States AS s

    ON a.State_Code = s.State_Code

    SELECT *

    FROM Individuals AS i

    LEFT OUTER JOIN (

    Addresses AS a

    INNER JOIN States AS s

    ON a.State_Code = s.State_Code

    )

    ON i.Ind_ID = a.Ind_ID

    Drew

    Yes, sequence and priority matter, but that doesn't change the fact that you could define a 5-table join in this, which is the question at hand.

    But even so, what you've done is change what you're asking for. The two queries are only superficially similar. It's a declarative change.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • For INNER joins only, I don't think the sequence really matters. SQL will re-arrange the joins as it needs to.

    For any type of OUTER join, as you've already seen, order is indeed critical.

    I think conceptually it's actually easier to think of a join as allowing mutliple tables rather than just 2, since sometimes your code will use values from two (or more) tables to join to one other table.

    It's generally not relevant to you to worry about how SQL will resolve the joins [except perhaps for performance], as long as you've coded the joins properly.

    Scott Pletcher, SQL Server MVP 2008-2010

  • drew.allen (10/26/2010)


    My issue with defining it as multiple tables is that you lose the fact that the order is important when you start introducing outer joins.

    You need to be more specific - it's the order of ON, not JOIN. Try the following:

    SELECT *

    FROM Individuals AS i

    LEFT OUTER JOIN Addresses AS a

    INNER JOIN States AS s

    ON a.State_Code = s.State_Code

    ON i.Ind_ID = a.Ind_ID

    Thanks to Jack Corbett for this.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply