Chain of tables.

  • Hello All,

    This is a generalised question.

    A is the parent of B.

    B is the parent of C.

    etc.

    Each parent parent can have multiple children.

    Each child has only one parent.:crying:

    Parent childs are linked by two fields, the X field which is 'general' for all tables and an Y field which is specific for each level.

    The two queries below are functionaly the same.

    ------------------------------------------------------

    -- A chain of tables. Links by linking each level

    ------------------------------------------------------

    select * from

    A

    JOIN

    B

    ON A.X = B.X AND A.Y = B.AY

    JOIN

    C

    ON B.X = C.X AND B.Y = C.[BY]

    JOIN

    D

    ON C.X = D.X AND C.Y = D.CY

    JOIN

    E

    ON D.X = E.X AND D.Y = E.DY

    ------------------------------------------------------

    -- A chain of tables. By directly linking with top parent.

    ------------------------------------------------------

    select * from

    A

    JOIN

    B

    ON A.X = B.X AND A.Y = B.AY

    JOIN

    C

    ON A.X = C.X AND B.Y = C.[BY]

    JOIN

    D

    ON A.X = D.X AND C.Y = D.CY

    JOIN

    E

    ON A.X = E.X AND D.Y = E.DY

    They should be equivalent.

    Question:

    Can the optimiser work this out that these queries are equivalent ?

    Or is one query preferred above the other for performance ? (Readability)

    Similar question:

    WHERE A.X = @AXPARAMETER

    is added to the first query.

    Should this be added to the second query or should the second query be changed to?

    ------------------------------------------------------

    -- A chain of tables. Parameter testing.

    ------------------------------------------------------

    select * from

    A

    JOIN

    B

    ON A.X = @AXPARAMETER AND B.X =@AXPARAMETER AND A.Y = B.AY

    JOIN

    C

    ON @AXPARAMETER = C.X AND B.Y = C.[BY]

    JOIN

    D

    ON @AXPARAMETER = D.X AND C.Y = D.CY

    JOIN

    E

    ON @AXPARAMETER = E.X AND D.Y = E.DY

    Ben

  • In general, I'd go with the first query. I think the optimizer will figure it out in most circumstances, but that assumes you have an enforced referential constraint in place (a foreign key using WITH CHECK). It'll mostly figure that stuff out on its own, but it won't or can't always. In general, I'd use the actual primary & foreign keys, despite the fact that you can jump tables.

    As to the parameter, no, probably not. If filtering on tableA is what you needed, just do that. The INNER JOIN will take care of the rest.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (6/16/2014)


    If filtering on tableA is what you needed, just do that. The INNER JOIN will take care of the rest.

    Thanks for you response, offcourse this was a 'simplyfied' example.

    So for the real query maybe SQL-server does not start at table A but at table D. So I was wondering if it would help 'jumping' tables and providing the information so that the query plan can use this 'redundant' information and or can the optimiser derive this extra information from the query?

    Reasons to start a D could be 'statistics', an added selection for example on D or just because D has a better structure for the query. I was wondering if in such a scenario jumping the table would help the optimizer to make a better plan.

    In the past (with 7 and/or 2000) jumping the tables with extra (redundant) information did help. So because it worked then we still sometimes use this type of code. Maybe this has become unnecceary since then.

    Thanks for your response,

    Ben

  • ben.brugman (6/17/2014)


    Grant Fritchey (6/16/2014)


    If filtering on tableA is what you needed, just do that. The INNER JOIN will take care of the rest.

    Thanks for you response, offcourse this was a 'simplyfied' example.

    So for the real query maybe SQL-server does not start at table A but at table D. So I was wondering if it would help 'jumping' tables and providing the information so that the query plan can use this 'redundant' information and or can the optimiser derive this extra information from the query?

    Reasons to start a D could be 'statistics', an added selection for example on D or just because D has a better structure for the query. I was wondering if in such a scenario jumping the table would help the optimizer to make a better plan.

    In the past (with 7 and/or 2000) jumping the tables with extra (redundant) information did help. So because it worked then we still sometimes use this type of code. Maybe this has become unnecceary since then.

    Thanks for your response,

    Ben

    Again, it really depends. If you don't need a table in a join, don't put it in a join. The optimizer can recognize this and take care of it through a process in the optimizer called simplification, but why bother. As for how adding the parameters to multiple tables helped performance, I'd need to see individual queries and query plans to even comment. Could it help? Sure. Will it in all circumstances? Not likely.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply