Correlated Subquey Join

  • Hello -

    Why must I use "non-Ansi" join syntax in the inner query of a correlated subquery.

    i.e.

    This syntax works:

    < Where Table1.Field1 = Table2.Field1 >

    And this syntax fails:

    < Table1 INNER JOIN Table2 on Table1...etc>

    Here are my real examples that hopefully you can try the same on some simple Parent-Child tables you may have.

    This WORKS(using WHERE in subquery):

    SELECT

    A.id_Account,

    (Select SUM(H.cTotalUSDValueAmt)

    FROM cdiHolding H

    WHERE H.id_Account = A.id_Account)

    FROM cdAccount A

    This FAILS (and takes 20 times as long):

    SELECT

    A.id_Account,

    (SELECT SUM(H.cTotalUSDValueAmt)

    FROM cdiHolding H

    INNER JOIN cdAccount

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

    The only real difference is the join methodology.

    Any of you Guru's know why ?????

    Thanks - B

  • First off, why not do this?

    select

    A.id_account, sum(h.cTotalUSDValueAmt)

    from

    cdAccount A

    join

    cdiHolding H on a.id_Account = h.id_Account

    group by

    a.id_account

    If you really have your heart set on using a subquery, try this;

    select

    a.id_account, h.total

    from

    cdAccount A

    join

    (select id_account, sum(cTotalUSDValueAmt) total from cdiHolding) h on

    a.id_account = h.id_account

    My guess is, the subquery syntax is going to take much longer because, first it has to sum all the cTotalUSDValueAmt values for each id_account on the cdiHolding table.

    Good luck,

    John

    Edited by - deuce on 08/15/2002 4:34:28 PM

    Edited by - deuce on 08/15/2002 4:34:43 PM

  • Thanks John for the reply.

    However, the key point is that not only does it take longer to run - but it returns incorrect results. Please read on...

    In standalone queries, this:

    Select H.cTotalUSDValueAmt

    FROM cdiHolding H, cdAccount A

    WHERE H.id_Account = A.id_Account

    is equivalent to this:

    SELECT H.cTotalUSDValueAmt

    FROM cdiHolding H

    INNER JOIN cdAccount A

    ON H.id_Account = A.id_Account

    They both work correctly and return exactly the same result set.

    HOWEVER, when I use the "WHERE" syntax(the first example) in the inner loop of a subquery - the query returns WRONG RESULTS.

    If you look at my first post - the only difference between the 2 queries is the Inner Query syntax - yet the 2 queries are really exactly the same thing.

    I'm not really looking for a better way to run the query - I can get things to work fast and properly - I just want to know why these 2 exactly the same queries behave differently when used in subqueires.

    Try something similar on one of your DBs and you'll lsee what I'm talking about.

    Thanks for the time John.

  • I think your second example (the one that does not work), is wrong. You shouldn't perform the join in the SELECT clause of the outer query.

    What you are saying in the subquery is to join with the A table (meaning the table in the FROM clause of the outer query), and not with the table cdAccount in the inner query.

    So joining with the cdAccount table is in effect combining each row from the cdiHolding table with each row from the cdAccount table.

    Suppose you have following data :

    cdiHolding.id_account cdiHolding.Amount

    --------------------- ----------------

    1 1

    2 10

    3 100

    1 1000

    cdAccount.id_Account

    --------------------

    1

    2

    3

    If you use the following query, you will get an overview of what happens...

    SELECT

    A.id_Account,

    (SELECT A2.id_Account, Amount

    FROM cdiHolding H

    INNER JOIN cdAccount A2

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

    The result set (without the SUM) for the query:

    A.id_account A2.id_Account Amount

    ------------ ------------- ------

    1 1 1

    1 2 1

    1 3 1

    1 1 1000

    1 2 1000

    1 3 1000

    And the same for a.id_Account equalling 2 and 3.

    Edited by - NPeeters on 08/16/2002 07:14:06 AM

    Edited by - NPeeters on 08/16/2002 07:14:46 AM

  • NPeeters - thanks for the reply. (I hate the howevers - but...)

    Your statement:

    "You shouldn't perform the join in the SELECT clause of the outer query."

    confuses me as I have to do such a thing to get a Correlated subquery.

    Look very carefully at my first post and you'll see that the ONLY difference between the 2 queries is the inner query. And the first example does run correctly. Ergo, IMHO, the Inner Queries must be different - but no - they are not different and are in fact functionally equivalent(see my last post). I think it's a parser problem w/ TSQL. Anyone else ????

  • Bilster, hate to be the 'know-it-all'

    I did not test the little query I wrote to proof my point (and it failed of course). I have attached complete DDL for the sample data I included, together with your query (See below).

    Anyway, if you would execute this, you will see that my prediction was correct. The query below works correctly (but is way too complicated for what you are trying to do...)

    SELECT

    A.id_Account,

    (SELECT sum(Amount)

    FROM cdiHolding H

    INNER JOIN cdAccount A2

    ON H.id_Account = A.id_Account and A2.id_Account = A.id_Account)

    FROM cdAccount A

    I've just added an additional join on the A2 table.

    create table cdiHolding

    ( id_account int,

    Amount int)

    create table cdAccount

    (id_account int)

    insert into cdiHolding

    values (1, 1)

    insert into cdiHolding

    values (1, 1000)

    insert into cdiHolding

    values (2, 10)

    insert into cdiHolding

    values (3, 100)

    insert into cdAccount

    values (1)

    insert into cdAccount

    values (2)

    insert into cdAccount

    values (3)

    -- The wrecked query

    SELECT

    A.id_Account,

    (SELECT sum(Amount)

    FROM cdiHolding H

    INNER JOIN cdAccount A2

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

  • Heh-

    I need a know-it-all cause I still don't understand.

    Using your DDL to create the tables:

    Since This(Query A):

    Select H.Amount

    FROM cdiHolding H, cdAccount A

    WHERE H.id_Account = A.id_Account

    {returns 1, 1000, 10, 100}

    Is EXACTLY the same as this(Query B):

    SELECT H.Amount

    FROM cdiHolding H

    INNER JOIN cdAccount A

    ON H.id_Account = A.id_Account

    {returns 1, 1000, 10, 100}

    Then shouldn't this(Query C):

    SELECT

    A.id_Account,

    (Select SUM(H.Amount)

    FROM cdiHolding H

    WHERE H.id_Account = A.id_Account)

    FROM cdAccount A

    {

    Returns:

    1 1001

    2 10

    3 1000

    }

    Be EXACLTY the same as this(Query D):

    SELECT

    A.id_Account,

    (SELECT SUM(H.Amount)

    FROM cdiHolding H

    INNER JOIN cdAccount

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

    {

    Returns:

    1 3003

    2 30

    3 300

    }

    It's obviously doing something for each row of one of the tables - but I do not understand why since Query A is the same as Query B ???

    Just interchange the "Exactly the same" inner queries.

    I must be missing something (nothing unusual there) - Bilster

  • Okay, I'll try to explain this. Just talking about the last (erroneous) query here. So no need to try to link this to any of your other queries.

    First, you should check your syntax. In your subquery, you are writing 'ON H.id_account = A.id_Account'.

    By using the A.id_Account syntax, you say that SQLServer should make sure that the id_Account from the H table should match the id_Account from the A table.

    No, if we look at the tablenames and aliases that you've used, you can see that the A table is in fact the cdAccount table in the last FROM clause (the outer query).

    Still with me so far, it gets even more complicated

    Now, since you have instructed SQL Server to join the cdiHolding H table with the cdAccount table (notice the lack of an alias), SQL Server will do that for you.

    The only problem is, that you don't indicate to SQL Server how he has to join those tables. For the database, the two 'cdAccount' occurences are distinct 'instances' of the same table. (This is quite useful, when you have to join a table with itself for some reason.)

    So you are telling SQL Server to 'join' the cdiHolding table with the cdAccount table twice (once with the ANSI syntax and once with the non-ANSI syntax).

    The join with the ANSI syntax does not specify which fields should be used for the join, since you are referencing the other 'instance' of cdAccount, so SQL Server is 'so stupid', that it just combines every record from cdAccount with every record from cdiHistory.

    The query is the same as (in NON-ANSI join syntax) :

    SELECT

    A.id_Account,

    (SELECT SUM(H.Amount)

    FROM cdiHolding H, cdAccount

    WHERE H.id_Account = A.id_Account)

    FROM cdAccount A

    I hope this makes everything a bit clearer to you.

  • And now for the short version

    The difference between Query C and Query D is the number of times you reference the cdAccount table.

    In Query C you only reference it once, whereas in Query D you reference it twice.

  • I think I got it now - thanks.

    I was trying to do something like this:

    SELECT

    A.id_Account,

    (SELECT SUM(H.Amount)

    FROM cdiHolding H

    INNER JOIN cdAccount A

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

    Or like this:

    SELECT

    A.id_Account,

    (SELECT SUM(H.Amount)

    FROM cdiHolding H

    INNER JOIN A

    ON H.id_Account = A.id_Account)

    FROM cdAccount A

    But neither of those work syntactically.

    In a nutshell - I'm trying to use ANSI syntax wherever possible, but in this case - the "WHERE" syntax is MUCH simpler.

    It doesn't force me to "Re-Alias" the Account table - thus it's simplicity.

    Anyway, thanks much - I don't FULLY understand - but I understand enough to stick with the "WHERE" syntax for my Correlated queries.

    - Bilster

  • Glad to be of help .

    Remember that everytime you write a tablename in a FROM clause, it is in fact a new 'instance' of that table for which you should use some kind of 'join' mechanism.

    So if you want to have a subquery, you shouldn't add the tables from the complete query to the FROM clause in the subquery.

    I would also like to point out that the first solution from Deuce is probably the most elegant and usual way to solve this kind of problem. It might be worth taking a closer look to his solution, just for future reference.

    And a final remark :

    Personally, I 'despise' of using any kind of 'visual tool' to construct queries. But it is can be a great help in trying to understand what is happening, and to get the syntax right.

    You can always build 'select' queries using Access or by constructing a VIEW in Enterprise Manager.

    Edited by - NPeeters on 08/16/2002 09:11:17 AM

  • WAY WAY Back I was trying to run a query that actually required the use of a correlated subquery and I was having problems. So, I broke it down into pieces and this was one of the pieces. I would, in fact, use the "Deuce 1" method normally.

    And, if you notice I made this statement:

    "I'm not really looking for a better way to run the query - I can get things to work fast and properly - I just want to know why these 2 exactly the same queries behave differently when used in subqueires."

    So, anyway thanks again guys - take care.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply