''Sargable'' where clause

  • is the below where clause likely to damage performance and prevent proper index use?

    Anyone know a more elegant efficient way to implement this logic?

    table concerned has million plus rows

     

     

     

     

    where isNull(T1.Col1, @date) >= @date

    www.sql-library.com[/url]

  • WHERE (T1.Col1 >= @date OR T1.Col1 IS NULL)

    _____________
    Code for TallyGenerator

  • thats les efficient i replaced the or with the the isnull method just wondered if there was a better way.

    www.sql-library.com[/url]

  • Only if you don't use index.

    Or half of values in the column = NULL.

    In my experience it ALWAYS dramatically decreases execution time.

    _____________
    Code for TallyGenerator

  • Make sure you run the code several times where indicated... you'll both be surprised... and remember the situation may change if a join is involved...

    --Drop table MyHead

    GO

    --===== Create a test table and populate with some data on the fly

     SELECT TOP 1000000

            IDENTITY(INT,1,1) AS RowNum,

            CAST(NULL AS DATETIME) AS SomeDate

       INTO MyHead

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Create a bunch of dates except for when the RowNum

         -- is = 100 so we also have some nulls

     UPDATE MyHead

        SET SomeDate = RowNum

      WHERE RowNum%100 <> 0

    --===== Create the appropriate Primary Key

      ALTER TABLE MyHead

        ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Add an index on the date column

     CREATE INDEX IX_MyHead_RowNum ON MyHead (SomeDate)

    --===== Do the test (just run this part over and over)

    DECLARE @BB DATETIME --Bit bucket to take display times out of the picture

    DECLARE @StartTime DATETIME

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

        SET @StartTime = GETDATE()

     SELECT @BB = SomeDate

       FROM MyHead

      WHERE ISNULL(SomeDate,GETDATE()) >= GETDATE()

      PRINT DATEDIFF(ms,@StartTime,GETDATE())

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

        SET @StartTime = GETDATE()

     SELECT @BB = SomeDate

       FROM MyHead

      WHERE (SomeDate >= GETDATE() OR SomeDate IS NULL)

      PRINT DATEDIFF(ms,@StartTime,GETDATE())

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The Isnull method cannot be more efficient.  Anytime you have a function on a column (in a Where clause from Join clause), the query must scan the table and evaluate the function for every row.  Using the OR method may still have a table scan, but if there is an index or statistics build on that column, will allow an index scan instead of a table scan, which will likely be more efficient.



    Mark

  • Mark,

    I'm thinking that you did not try the code.  Use the table build code to build the table and the related indexes.  Then do an execution plan on the code that has the comment "--===== Do the test (just run this part over and over)"

    When you run that part of the code, be sure to run it at least 4 times...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, the same way I can easily proof that walking is faster than driving a car. I just need to build special scenario for this.

    Your point is not about query, it's just an exampe of the rare case when index is not effective. Query returns >90% of rows, that's why scanning just leaf level of the index is faster than seeking through all levels of the same index.

    To proof my point I suggest just to change ">=" to "<=" in your query and rerun it.

    So, as I said, replacement of "ISNULL(SomeDate,GETDATE()) >= GETDATE()" with "(SomeDate >= GETDATE() OR SomeDate IS NULL)" will let SQL Server to use index if there is one.

    And you example shows that index seek is not always the fastest way to retrieve data.

    BTW, I've mentioned such possibility recently, here: http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=333170

    _____________
    Code for TallyGenerator

  • Outstanding, Serqiy... at least SOMEBODY tried this and it figures that it would be you

    Actually, I whole heartedly agree with you and so does the code I just kinda dropped on folks to see what would happen... except in very rare cases, the OR will usually work better/faster than the ISNULL and I use the OR method everywhere I can (I still always do the test, though).  As you so very well pointed out, especially in the 333170 post you pointed out, the devil is in the data, in that and in this case.

    But, even you, my old friend, are still saying the OR "will let SQL Server to use index if there is one".  The real fact of the matter is that both methods will let SQL Server use an Index... neither one forces a table scan as most people think (look at Mark's post as an example).  I agree that the OR will allow an INDEX SEEK instead of the mere INDEX SCAN that the ISNULL forces, but they both allow an index to be used and (like you said in 333170) the INDEX SEEK doesn't always win the race (although, I agree it's most likely to and usually does).

    That's gotta be confusing to people who post these types of questions... everyone keeps saying the ISNULL method will force a table scan and it doesn't... they look at the execution plan for their code and wonder what the heck we're talking about.  What looks like "non-sargeable" code doesn't always force a table scan and we need to stop saying it does.

    Does that mean it's ok to use the ISNULL method instead of the OR method?  Maybe, maybe not (I agree, usually not).  But Jules obviously did some testing and, in his case, for some odd reason, the ISNULL method turns out to be more "efficient".  It means that no matter what the execution plan (or other people) says, you've gotta test the code for performance and scalability AND, more importantly, you have to understand what the underlying data is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jules,

    The real fact of the matter is, both the ISNULL and the OR method really stink up the room in this situation (NULL being treated as "future dated")...

    To answer the immediate question, the only method that I know of that is "better" than either method, is to fill the NULL dates with 99991231 and eliminate the need for a NULL check altogether.  When I do that to the example code I posted, I get an immediate boost in performance of 9 to 23%... but that's literally only half the story... because of the amount of data being returned (>90%) compared to the size of the table, this method gives a performance improvement of a whopping 27 to 43% on the code I posted IF YOU REMOVE THE INDEXES!  That's because it's actually cheaper, in this case (most of the data in the table being returned), to do a full table scan than it is to find the data in the index and translate that info into a lookup on the table.

    I think that might be my next "try" if I were in your shoes with that tera-byte data base of yours...

    [EDIT] Almost forgot... obviously, you have to take into consideration whether or not there is other code that uses NULLs as future dating, specifically code that uses WHERE somedate IS NULL and WHERE somedate IS NOT NULL (some use WHERE somedate > 0 which is a bit faster than NOT NULL). The ISNULL(somedate,@Date) >= @Date code will NOT be affected although it will not enjoy the boost in performance from the data change.  If you make the change above, you could be breaking other code and not just in SQL... you could also be breaking GUI code.  Considering the possible performance improvement, though, it might be worth making the necessary changes to the other code (compare against '99991231' instead of NULL when required).  I'm thinking that you need to test a bit more and verify that the potential performance boost will be realized  before you change the data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • > But, even you, my old friend, are still saying the OR "will let SQL Server to use index if there is one". The real fact of the matter is that both methods will let SQL Server use an Index... neither one forces a table scan as most people think (look at Mark's post as an example).

    It says "Index Scan" only because there is an index on this column.

    Remove that index and it will become "Clustered Index Scan".

    Remove clusrtered index and it will say "Table Scan".

    In fact all 3 expressions mean the same - scanning all values in the column.

    The difference is only where those values are sitting - either in leaf level of index on this column or in rows of the table (which are a part of clustered index if there is one).

    _____________
    Code for TallyGenerator

  • Nope, with the second index in the code, I'm getting an INDEX SEEK on the OR method...

    But I agree with the rest... why go through an index at all to do a table scan?  Just slows things down...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply