sql question using IN

  • I agree with all of you about IN is bad and NOT IN is worse. I've got alternative for IN i.e. using INNER JOIN or EXISTS but what is the alternative to NOT IN. I found EXCEPT clause in SQL 2005 but that requires equal number of fields and same order, moreover you cannot really stuck with same number/types of fields all the times. I'm sure I'm missing something but what is it, 'm stuck

  • You can use a left outer join to do the same.

    Show me everything in table 1 where not in table 2.

    So long as your columns are correctly indexed, it'll be a fast query and be SARGable.

    CREATE

    TABLE #abc (

    Col1

    int

    )

    CREATE

    TABLE #def (

    Col1

    int

    )

     

     

    INSERT

    INTO #abc VALUES (1)

    INSERT

    INTO #abc VALUES (2)

    INSERT

    INTO #abc VALUES (3)

    INSERT

    INTO #def VALUES (1)

    INSERT

    INTO #def VALUES (2)

    INSERT

    INTO #def VALUES (5)

     

    SELECT

    *

    FROM

    #abc A

    LEFT OUTer JOIN #def b ON a.col1 = b.col1

    WHERE

    b

    .col1 IS null

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • I've seen the article before and I had the same question then.

    Take the following query:

      select c1 from t1 where c2 = @value

    There is an index defined on c2. So this query is perfectly sargable. Now place the query as a subquery in an IN statement.

      ...IN (select c1 from t1 where c2 = @value)

    While a join would be (in most cases) more efficient, isn't the query, as a subquery, just as sargable as when it was a query? And if we then add NOT:

      ... NOT IN (select c1 from t1 where c2 = @value)

    how does this effect the sargability of the subquery?

    According to the article you cite, just adding NOT contributes to poor performance "because the SQL Server optimizer has to use a nested table scan to perform this activity". Why?

    Is it because there is a problem in the SS optimizer where the addition of NOT throws it for a loop and it generates table scans no matter how the subquery is constructed? If so, then fine, that's good to know. But let's not talk about it as general relational behavior.

    Unless, of course, it is general relational behavior. I must admit that, except for simple queries where I'm just looking for a few known values (...IN ('this', 'that', 'other')...), it's probably been more than 10 years since I've used a subquery in an IN. I learned why not to do this so long ago the specifics have faded in my rapid aging memory and I have no recollection that NOT IN was any worse than just IN. But I am ready to be corrected.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • I was about to add when asking my question that please don't advise using LEFT OUTER JOIN as I found this sometimes taking same time as NOT IN but EXCEPT. Is there any other suggested method as an alternative to NOT IN other than LEFT OUTER JOIN. Just bit curious!

  • Not exists.

    Right join .

    Change the db design

  • very funny!!!

  • Thanks.

     

    But that's actually pretty all you can do.  You can always do select * from both tables and manually do the joins... but there comes a point where you just got to use the tools available for hte job .

  • Jim,

    What is the work assignment that says you must use IN?  The reason I ask is that it's a strange assignment and, as others have mentioned, is generally a bad idea.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually, NOT IN and NOT EXISTS create identical execution plans and both are a touch more effecient than an outer join with null detection... as always, here's the proof...

    --===== Create a decent size test table

         -- Takes about 42 seconds to execute.

     SELECT TOP 1000000

            RowNum     = IDENTITY(INT,1,1),

            SomeInt    = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

            SomeCSV    = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',

            SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

            SomeDate   = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a primary key

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Create another table (bit of a heap, really)

     SELECT TOP 1000

            RowNum     = ISNULL(CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),0),

            SomeDate   = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

       INTO dbo.JBMTest2

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Add an index

     CREATE INDEX IX_JBMTest2_RowNum ON dbo.JBMTest2 (RowNum)

    --===== These 2 queries form identical execution plans

         -- and take the same time to execute

     SELECT t1.RowNum

       FROM dbo.JBMTest t1

      WHERE t1.RowNum NOT IN (SELECT RowNum FROM JBMTest2)

     SELECT t1.RowNum

       FROM dbo.JBMTest t1

      WHERE NOT EXISTS (SELECT 1 FROM JBMTest2 WHERE RowNum = t1.RowNum)

    --===== This query takes just slightly LONGER

     SELECT t1.RowNum

       FROM dbo.JBMTest t1

       LEFT OUTER JOIN

            dbo.JBMTest2 t2

         ON t1.RowNum = t2.RowNum

      WHERE t2.RowNum IS NULL

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Results please?

  • Heh... gettin' lazy in your old age, Remi   All times in milliseconds to return 999015 rows in the grid mode of Query Analyzer...

    5941 NOT IN

    5994 NOT EXISTS

    6175 OUTER JOIN WITH NULL DETECT

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 9096 NOT IN

    9865 NOT EXISTS

    8954 OUTER JOIN WITH NULL DETECT

    9816 EXCEPT in SQL 2005

    Just in case,

    SELECT RowNum

    FROM dbo.JBMTest

    EXCEPT

    SELECT RowNUM

    from dbo.JBMTest2

    And I always thought EXCEPT is better than NOT IN or NOT EXISTS or LEFT OUTER JOIN (Profiler is your best friend) everyday is a new day at this forum.

  • I understand what you mean, but as you also know it's not possible to change DB design very often.

  • You know that results change from PC to PC in that kind of test... and since you already have them printed on your screen, I think it's a farely easy step to just copy/paste the results to us as well as the test .

     

    As for the laziness... let's just say I'm busy these days (which is a great thing btw) .

  • Heh... yeah, I know... thanks for the reminder to post the results (it's true, I forget a lot)... you're lazy and I'm forgetful... we make a hell of a team

    "Jack Sprat could eat no fat,

    his wife could eat no lean,

    so between the two, when they ate,

    they licked the platter clean." 

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply