Any way to use "not in" with multiple columns?

  • I currently have:

    INSERT INTO table1

    SELECT colA, colB, colC, colD, colE

    FROM table2 t2

    LEFT OUTER JOIN table1 t1

        ON t1.colC = t2.colC

        AND t1.colD = t2.colD

    WHERE t1.colX IS NULL

     

    I'd like to get a performance comparison between that and something like:

    INSERT INTO table1

    SELECT colA, colB, colC, colD, colE

    FROM table2

    WHERE colC, colD NOT IN (SELECT colC, colD FROM table1)

     

    I'm dealing with record sets around 5 million for both t1 and t2.  The columns being compared are CHAR(3)'s.  ColX in the first query is an arbitrary column.  Anyone?  Anyone?  Other ideas entirely?  Thanks...

     

  • OUTER JOINs always perform better than NOT INs. 

    I don't believe you can do what you are trying to do.

  • For NOT IN with multiple join columns, you need a correlated sub-query, which means you need NOT EXISTS:

    INSERT INTO table1

    SELECT colA, colB, colC, colD, colE

    FROM table2 As t2

    WHERE NOT EXISTS (

      SELECT *

      FROM Table1 As t1

      WHERE t1.ColC = t2.ColC

      AND     t1.ColD = t2.ColD

     

  • The only way to do it as a single NOT IN subquery would be to concatenate the columns. But unless you use indexed calc columns, and probably even then, this would perform worse than an outer join.

    So you should use an outer join, specifying in the where clause that a join column on the outer table IS NULL:

    select t1.c1, t1.c2
    from t1 left join t2 on t1.c3 = t2.c3 and t1.c4 = t2.c4
    where t2.c3 is null

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • I agree with JeffB... outer join, search for the nulls... haven't tested for speed but seems that it would be faster than either the correlated subquery or the concatenate method... could be wrong, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • >>haven't tested for speed but seems that it would be faster than either the correlated subquery

    I generally get the same execution plan using NOT EXISTS or LEFT JOIN check for NULL. I tend to use NOT EXISTS so that I know the intention of the query a year from now when I have to maintain/modify it.

     

  • Bold assertion: left anti-join never performs worse than the equivalent NOT EXISTS...

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • thanks fellas.  I'm actually getting the quickest times using the concatenate and NOT IN clause.  Could be a fluke, but I made sure to run freeproccache and dropcleanbuffers first.

    I'll revisit the perf testing tomorrow and post the times and table defs.

     

     

  • is that using indexed calc columns?

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • This is working perfect for me.

    Thanks 🙂

  • There are some performance comparisons of various techniques at http://weblogs.sqlteam.com/peterl/archive/2009/06/12/Timings-of-different-techniques-for-finding-missing-records.aspx

    "NOT EXISTS" was the fastest for those specific test conditions.

    SQL = Scarcely Qualifies as a Language

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply