The Subquery

  • paul.knibbs (5/9/2013)


    Have to admit, I'm puzzled by this behaviour. It doesn't make logical sense for SQL to make a substitution like this when the table you're selecting from is explicitly named in the subquery, surely? :ermm:

    Actually, it does make more sense than you may think. It is the result of a combination of features that, in and of themselves, are rather innocent.

    We all know that we can combine multiple tables in a query - not only with subqueries, also with joins. And though I personally wouldn't mind otherwise, SQL does not force you to table-prefix all column references. A query like "SELECT Col1 FROM Table1 JOIN Table2 ON Col2 = Col3" is legal, as long as each of the unqualified columns exists in exactly one of the tables. Prefixes are technically required only when column names are ambiguous.

    We also all know that subqueries can be correlated, meaning that a subquery can reference data from the outer query. E.g. "SELECT a.Col1 FROM Table1 AS a WHERE a.Col2 IN (SELECT b.Col3 FROM Table2 AS b WHERE b.Col4 = a.Col5)". The "a.Col5" is a reference to the outer query - and I am pretty sure almost everyone here knows this, has seen it, and has coded it.

    Combine the two, and you get the effect of the QotD. SQL Server first tries to resolve the unqualified reference to client_id within the subquery. When that fails, it moves on to the next layer, assuming we wanted a correlated subquery - and then it does find a match.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • nenad-zivkovic (5/9/2013)


    Since nothing from subquery's table is actually selected here - it can very well be omitted. Any of these would be exactly the same:

    select * from address_oltp where client_id in (select client_id from address_staging)

    select * from address_oltp where client_id in (select client_id)

    select * from address_oltp where client_id = client_id

    select * from address_oltp where 1=1

    select * from address_oltp

    Almost - but not completely. The first one does actaully select rows from address_staging in the subquery, and then references address_oltp.client_id for each of those rows. So if the address_staging has three rows, the first query is equivalent to "... where client_id in (client_id, client_id, client_id)". If address_staging has twenty rows, the list becomes longer. In all those cases, the result will still be the same.

    But if address_staging is empty, the subquery is empty as well. And since none of the rows in address_oltp have client_id in an empty list, the query will now return no rows anymore.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • nenad-zivkovic (5/9/2013)


    Toreador (5/9/2013)


    A good question, but I'm not so sure about the explanation.

    If a column is referenced in a subquery that does not exist in the table referenced by the subquery's FROM clause, but exists in a table referenced by the outer query's FROM clause, the query executes without error.

    Fair enough, but this isn;'t the case here, as the ciolumn referenced in the subquery does exist in the table referenced by the subquery's FROM clause.

    No it does not. You should look carefully - table in subquery (address_staging) have a column clientid and select is using client_id. Mind the underscore _.

    Funny fact - I've recently wrote something (article) about this - I've called it "Accidental correlated subqueries". (So I've spotted it on sight here 🙂 ) This is actually quite possible to happen in real life situations, and could be very dangerous when used with delete statement. We've once ruined a production table because of it (true story).

    Best practice to make sure you don't make mistake with incorrect column names should be to always use table names or aliases in front of column names:

    select *

    from address_oltp t1

    where t1.client_id in (select t2.client_id from address_staging t2)

    Have the code been written like this it would produce an error and you would spot something is not written correctly.

    Ye its true, its better if we use table aliases or aliases in front of column names

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Hugo Kornelis (5/9/2013)

    Actually, it does make more sense than you may think. It is the result of a combination of features that, in and of themselves, are rather innocent.

    Thanks for the explanation. I have occasionally written correlated subqueries, yes, but I always use the table prefix to refer to the outer table--it would never have occurred to me that not only would SQL allow you to do otherwise, but would potentially end up selecting from a completely different column than the one you wanted! Guess I need to mark this up as another reason for always specifying the table name in queries involving joins and subqueries.

  • paul.knibbs (5/9/2013)


    Hugo Kornelis (5/9/2013)

    Actually, it does make more sense than you may think. It is the result of a combination of features that, in and of themselves, are rather innocent.

    Thanks for the explanation. I have occasionally written correlated subqueries, yes, but I always use the table prefix to refer to the outer table--it would never have occurred to me that not only would SQL allow you to do otherwise, but would potentially end up selecting from a completely different column than the one you wanted! Guess I need to mark this up as another reason for always specifying the table name in queries involving joins and subqueries.

    Yup! I'm pretty sure I've said (written) this a hundred times already, but it never hurts to add one more: whenever a query references more than a single table, always give aliases to all of the tables (this is not actually required, but it helps with the rest), and always prefix ALL column references (which is where those aliases help).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Stewart "Arturius" Campbell (5/9/2013)


    God question, thanks - one of the reasons why i prefer to use JOINs over subqueries.

    +1

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Great question..

    --
    Dineshbabu
    Desire to learn new things..

  • Nice question Reddy. Thanks.


    Sujeet Singh

  • I'm afraid I am not understanding.

    I agree that there is no syntax error, but I still don't get three rows returned.

    What am I missing?

    EDIT - pay no attention to the old blonde who is one day away from a much needed vacation. Perhaps it is time to get my eyes examined...:hehe:

  • Interesting, good question...

  • Thanks..

    Great question..

  • Stewart "Arturius" Campbell (5/9/2013)


    God question, thanks - one of the reasons why i prefer to use JOINs over subqueries.

    The downside to joins is that they can permit duplication if referential integrity and uniqueness were not enforced. I ran into this yesterday with a customer who had one address from 01/01/1900-12/31/2199 and another address from 1/1/2013-12/31/2199.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • I was also confused about the explanation. Even though I got it correct, I first thought it would fail.

  • That was a great question. Especially the similarity of the column names to demonstrate how easily a typo can be overlooked and still return what you expect (since in this case the primary keys matched). A better example may have been to have the primary keys not match and the three rows are still returned. I think that would have caused some head scratching from some people.

  • Nice question on this topic. It's good to see things like this helping to explain those little subtleties.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply