Puzzler

  • This is not a critical production question, only something I have found that I can't figure out.

    I have this query that returns 103,038 records:

    Select

       a.RecId

    From

       dbo.TableA a

       Left Outer Join dbo.TableB b

         on b.TableA_RecId = a.RecId

    Where

       b.RecId is NULL

    If I modify this query by adding a filter in the right join table, the query now returns 103,353 records:

    Select

       a.RecId

    From

       dbo.TableA a

       Left Outer Join dbo.TableB b

         on b.TableA_RecId = a.RecId

            and

            b.Flag = 'Y'

    Where

       b.RecId is NULL

    Since it's a Left Outer Join, and I'm only selecting those records where the right table returns a NULL record (b.RecId is a non-nullable identity column) shouldn't both queries return the same number of records?  But, where clause aside, shouldn't a query with a filter return the same or lesser amount of records?  Maybe having the filter in the join causes some interaction that I don't understand.

    Is there a guru around who can tell me what is happening, here?

     

  • Can you post the table DDL and a small sample set of data that causes the problem ?

    I can't explain it, but IMO, it helps maintenance-wise to use NOT EXISTS to make the purpose of the query clearer. Left-Joining and checking for NULL on the table Left Joined to is a common coding construct, but I find NOT EXISTS to be a much clearer in expressing the intention of the query:

    Select

       a.RecId

    From

       dbo.TableA a

    Where Not Exists (

      Select *

      From dbo.TableB b

      Where b.TableA_RecId = a.RecId

      And   b.Flag = 'Y'

    )

     

  • Another reason for the # of records to go up is that NOT all of the records from table B have a Y flag. It stands to reason that if you add the criteria of where the flag = Y and there are records where it is NOT Y then wouldn't that change the counts?



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • I have an answer. You're going to kick yourself when you realize what happens here... so allow me to explain.

    Consider the simplified dataset below:

    TableA

    ID

    1

    3

    4

    6

    7

    8

    TableB

    ID Flag

    1 Y

    4 N

    7 N

    8 Y

    The first query (Left outer join on ID only) would return:

    2 (3 & 6 have no matching records)

    The second query (LOJ on ID and Flag) would return:

    4 (3 & 6 have no matching records: 4 & 7 do not match the 'Y' criteria for flag)

    Got it?

  • Your WHERE clause means that you only get the rows in tableA that don't match something in tableB. Since you're being MORE RESTRICTIVE in finding matching rows in your second queries, there are more records in tableA that don't match something in tableB, hence the number of records in your result goes up.

    RobF

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • OK.  I can see now that I was so used to using queries to return matched records that I was thinking of this query in the same way.  But it isn't the same, its the opposite.  This query is returning unmatched records.  So if I insert a filter which decreases the number of matched records, the number of unmatched records must, obviously (since the total number of records in TableA remains static), increase.

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply