To OR or not to OR

  • Hi forum,

    quite often we use stored procedures to select either subset of all rows based on some value or all values in the table. For example, lets say, we have table address with column state (indexed, of course, often clustered). We want to be able to select addresses either for a particular state or all addresses. So we want stored procedure not unlike this one:

    create procedure getAddressByState
    @state char(3) = null
    as
    select * from address where
    state = @state or @state is null

    What I'm wondering about is how to optimise WHERE clause. For example, I can write:

    create procedure getAddressByState
    @state char(3) = null
    as
    select * from address where
    isnull(@state, state) = state

    or even I can write:

    create procedure getAddressByState
    @state char(3) = null
    as
    if @state is null
        select * from address
    else
        select * from address
        where state = @state

    I wonder which one will perform better.

    Any thoughts are appreciated

    George

    PS. For pedants:

    1. * is selected only for brevity.
    2. We have 3-character state abbreviations in Australia


    --
    George

  • I would go with your third option above.  The first option will needlessly scan through your address table looking for rows where STATE = null when all you really want is all the rows in the table. 

    The second option will be inefficient as well, because even if @state is not null, by wrapping your seach column (state) in a function (in this case the isnull() function), you disallow the optimizer from using any index that might exist on the state column, so you get a table scan. (You should always avoid using a function on the left side of your WHERE clauses if you want to make use of any available index on a column.  Also avoid doing mathematical operations on the left side for the same reason.)

    Option 3 looks like the most efficient way to go, IMHO.

  • Frank,

    I think SQL Server optimiser deserves a little more credit. In the sample below I use real table from my database. Table contains about 80k rows and has a clustered index on renled_parent_id. The output with SET SHOWPLAN_ALL ON

    declare @tenant int 
    set @tenant = null

    select * from rent_ledger
    where renled_parent_id = @tenant or @tenant is null

      |--Bookmark Lookup(BOOKMARK: ([Bmk1000]), OBJECT: ([primelink].[dbo].[rent_ledger]))

           |--Index Scan(OBJECT: ([primelink].[dbo].[rent_ledger].[constr_pk_318]),  WHERE: ([rent_ledger].[renled_parent_id]=[@tenant] OR [@tenant]=NULL))

     
    select * from rent_ledger
    where isnull(@tenant,renled_parent_id) = renled_parent_id

      |--Clustered Index Scan(OBJECT: ([primelink].[dbo].[rent_ledger].[index_319]), WHERE: (isnull([@tenant], [rent_ledger].[renled_parent_id])=[rent_ledger].[renled_parent_id]))

     
    if @tenant is null

       select * from rent_ledger

           |--Clustered Index Scan(OBJECT: ([primelink].[dbo].[rent_ledger].[index_319]))

    else   

       select * from rent_ledger where    renled_parent_id = @tenant

           |--Clustered Index Seek(OBJECT: ([primelink].[dbo].[rent_ledger].[index_319]), SEEK: ([rent_ledger].[renled_parent_id]=[@tenant]) ORDERED FORWARD)

    As you can see, there is no table scan in sight and optimiser can still use index because isnull is sargable predicate, AFAIK. Cost-wise the "else" part of the third query is the fastest with clustered index seek but I have to weight it against frequency of requests with parameter set to null.

    Cheers


    --
    George

  • ^^^ Interesting.  I've read several articles that stated that using functions around search arguments prevents the optimizer from using indexes.  (Including an article or two referenced on this site!)  Apparently that is not such a clear cut rule as some would make it out to be.  Thanks for clearing up the misconception.  

  • georged,

    Be careful! Clustered Index Scan meens in fact table scan!

    Try it out by removing the clustered index and creating a nonclustered on instead. You can see the difference.

    Also check why with your first query the optimiser is using the constr_pk_318 index while with the second one the index named index_319.

    An another proof is your third query where you have a select without where clause. Obviously there will be no index usage. Never the less you can see a clustered index scan, which in fact is a table scan. This is because with a clustered index the data itself is the index.



    Bye
    Gabor

  • To ensure you get all data in the case of a null value for the variable you need to use either the 2nd or 3rd. Now I have seen situations where ISNULL(@var,value) = value does and doesn't impact speed it can make a difference as to if the value column is the clustered inex or not. Personally I always opt for the following.

     

    create procedure getAddressByState;1
    @state char(3) = null
    as
     
    set nocount on
     
    if @state is null
        Exec getAddressByState;2
    else
        Exec getAddressByState;3 @state
     
    go
     

    create procedure getAddressByState;2
    as
     
    set nocount on
     
    select (columnlist) from address
     
    go
     

    create procedure getAddressByState;3
    @state char(3)
    as
     
    set nocount on
     

    select (columnlist) from address
        where state = @state
     
    go
     

    This way I get a seperate stored execution plan for each condition on first run that is optimized for it.

  • Gabor, good point. I have no clue as to why first query uses constr_pk_318. Just to clarify, index_319 is a clustered index on renled_parent_id column (int) and constr_pk_318 is a primary key on renlen_id column which is also an int. Wonder if substituting real column list instead of * will make any difference to the plan (though most of the table columns are selected)

    Antares686, thanks, really like your approach - I'll give it some thoughts. Forgot to mention that column is not null so first query will still work .


    --
    George

  • In this case get it pragmatically.

    • Check if your statistics are up to date (update statistics)
    • check the fragmentation status (dbcc showcontig, then dbcc dbreindex if neccessairy)
    • issue a single query like
      select * from rent_ledger
      where renled_parent_id = 1111

    • Check the plan

    • issue a second single query like this.
      declare @tenant int 
      set @tenant = 1111

      select * from rent_ledger

      where renled_parent_id = @tenant

    • Check the plan

    • issue a third single query like this.
      declare @tenant int 
      set @tenant = 1111

      select * from rent_ledger

    • where renled_parent_id = @tenant or @tenant is null
    • Check the plan

    • All three queries should produce the same plan
    As an info the following query should not use any index because @tenant is null:
    declare @tenant int 
    set @tenant = null

    select * from rent_ledger
    where renled_parent_id = @tenant or @tenant is null



    Bye
    Gabor

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply