Wrong index chosen when variable in WHERE clause

  • I'm stumped - why does the first query perform worse? The table has appx 20 million records. There is a clustered index on the table, on a field named prim_key. There is a non-clustered index on the post_dt field. The post_dt field also is part of a multi-field index. The first query does an index scan on the multi-field index, then does a hash-match. The second query does a clustered index seek on prim_key, than an index seek on post_dt. SS2005 has the first query as 100%, the second as 0%. Note that I played around with the variable, malking it a char, varchar, etc. The post_dt field in the table is datetime.

    --------------------------------

    declare @lcDataDt as datetime

    set @lcDataDt = '2007-02-14'

     select distinct acct_num from h_svc where post_dt >= @lcDataDt  

    select distinct acct_num from h_svc where post_dt >= '2007-02-14'

    --------------------------------

    Thanks in advance!

    Matt-

  • Can you post the full table DDL including the indexes and the execution plans for both queries?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Can you post the full table DDL including the indexes and the execution plans for both queries?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I had a similar problem this week.  Try the following.

    select distinct acct_num from h_svc where post_dt >= (select @lcDataDt)

     

    ~Steve

  • Steve G - thanks for the reply, but same results.

    John R - thanks for the reply, I had to simplify cuz the table is pretty big:

    -- TABLE

    CREATE

    TABLE [dbo].[H_Svc]([prim_key] [int] NOT NULL DEFAULT (0),[Acct_Num] [int] NOT NULL DEFAULT (0),[Post_Dt] [datetime] NULL DEFAULT (null),[Col1] [char](1) NULL DEFAULT ('O'),[Col2] [char](10)  NOT NULL DEFAULT (''),[Col3] [money] NOT NULL DEFAULT (0),[Col4] [datetime] NULL DEFAULT (null),CONSTRAINT [h_svc_prim_key] PRIMARY KEY CLUSTERED ([h_svc_prim_key] ASC)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 90) ON [PRIMARY]) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    -- NON-CLUSTERED INDEX1

    CREATE

    NONCLUSTERED INDEX [Post_Dt] ON [dbo].[H_Svc] ([Post_Dt] ASC)WITH (PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 90, ONLINE = OFF) ON [PRIMARY]

    -- NON-CLUSTERED INDEX2

    CREATE

    NONCLUSTERED INDEX [Col1_ActHs_Num_Post_Dt_Col2_Col3_Col4] ON [dbo].[H_Svc] (

    [Col1]

    ASC,[Acct_Num] ASC,[Post_Dt] ASC,[Col2] ASC,[Col3] ASC,[Col4] ASC)WITH (PAD_INDEX = OFF

    QUERY 1

    |--Parallelism(Gather Streams)

       |--Hash Match(Aggregate, HASH: ([Test_DB].[dbo].[H_Svc].[Acct_Num]))

          |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS: ([Test_DB].[dbo].[H_Svc].     [Acct_Num]))

             |--Hash Match(Partial Aggregate, HASH: ([Test_DB].[dbo].[H_Svc].[Acct_Num]))

                |--Index Scan(OBJECT: ([Test_DB].[dbo].[H_Svc].[Col1_ActHs_Num_Post_Dt_Col2_Col3_Col4]), WHERE: ([Test_DB].[dbo].[H_Svc].[Post_Dt]>=[@lcDataDt]))

    QUERY 2

    |--Sort(DISTINCT ORDER BY: ([Test_DB].[dbo].[H_Svc].[Acct_Num] ASC))

       |--Nested Loops(Inner Join, OUTER REFERENCES: ([Test_DB].[dbo].[H_Svc].[prim_key]) OPTIMIZED)

          |--Index Seek(OBJECT: ([Test_DB].[dbo].[H_Svc].[Post_Dt]), SEEK: ([Test_DB].[dbo].[H_Svc].[Post_Dt] >= '2007-02-14 00:00:00.000') ORDERED FORWARD)

             |--Clustered Index Seek(OBJECT: ([Test_DB].[dbo].[H_Svc].[H_Svc_prim_key]), SEEK: ([Test_DB].[dbo].[H_Svc].[prim_key]=[Test_DB].[dbo].[H_Svc].[prim_key]) LOOKUP ORDERED FORWARD)

     

  • You must have clustered index on the column you use in range selections.

    In your case it's [Post_Dt].

    I don't think there is any use of clustered index on PK.

    So, make it nonclustered. Clustered index on [Post_Dt] will be more useful.

    BUT!

    Check the logic of your application. I suspect the typical query is not the one you posted, but somethind like this:

    WHERE [Acct_Num] = @SomeAccount AND [Post_Dt] > @SomeDate

    In this case you must have clustered index on ([Acct_Num],[Post_Dt])

    _____________
    Code for TallyGenerator

  • Sergiy, yeah the typical query has more than 1 clause in it. Perhaps the clustered index needs to be changed, but the other DBA and I do not have time/resources to muck with that at this point (like all of us over-worked DBAs ), as the table is too large and too central to the application.

     

    Thanks!

  • Big tables require proper indexing.

    That's a rule #1.

    The rule #2:

    Tables with many rows should not have many columns.

    Dividing fat table into 2 or more will let you create 2 or more clustered indexes on the same resultset. Optimiser will choose which one is better to use for each particular query.

    Your current indexing is probably the worst possible option. So, you not gonna lose performance by changing indexing.

    But anyway it's not the right way to play with indexes on production system.

    Backup database, restore it in DEV environment and do your exersises there. After you have concluded the best option apply it on production database.

    _____________
    Code for TallyGenerator

  • could this be realted to the parameter sniffing issue that sometimes occurs...that is exactly the symptom...works fine with static value, very slow with variable... search for parameter sniffing  and see the many questions on this issue.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell - you da man. You are right. Here is an article that helped me:

    http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=11611

    Thanks everybody!

    Matt-

  • Just went through this myself.

    See: http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=347813

    In my case, query hints would help a lot, but constructing the TSQL query within the procedure and executing with sp_ExecuteSQL made it lightning fast.

    I did bookmark that thread you uncovered though. It may be useful in the future.

    Chris

  • BTW - the 2 ways to run it (first one is easier obviosly) that use the proper index:

     

    declare @lcDataDt as datetime

    set @lcDataDt = '2007-02-14'

    select distinct acct_num from h_svc where post_dt >= @lcDataDt  OPTION(RECOMPILE)

    --------

    declare

    @lcDataDt as nchar(10), @cSql nvarchar(1000), @cParms nvarchar(100)

    set

    @lcDataDt = '2007-02-14'

    set @cSql = 'select distinct acct_num from h_svc where post_dt >= @lcDataDt'

    set

    @cParms = N' @lcDataDt nchar(10)'

    exec

    sp_executesql @cSql, @cParms, @lcDataDt=@lcDataDt

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply