Why does SQL choose the "wrong" index

  • I have a simple, single table query that runs about 25,000 times a day in my environment, and I am trying to figure out why SQL would choose the execution plan it does. First, here is pseudo code for the table that query runs against:

    CREATE TABLE dbo.Persons (

    PersonID INT identity(1, 1),

    LastName NVARCHAR(255),

    FirstName NVARCHAR(255),

    MidName NVARCHAR(255),

    DOB DATE,

    Address NVARCHAR(255),

    City NVARCHAR(255),

    country NVARCHAR(255),

    email nvarcahr(255),

    PostCode NVARCHAR(255)

    ....

    ....

    ....

    ) CONSTRAINT [Persons_PK] PRIMARY KEY CLUSTERED (personID ASC)

    The "...." implies that there are many more attributes to this table - 80 or so to be exact. The table contains about 50 million records.

    The table has several indexes. Here are the definitions of the relevant ones:

    CREATE NONCLUSTERED INDEX [IXNC_Persons_PostCode] ON [dbo].[Persons]

    ([PostalCodeVal] ASC)

    CREATE NONCLUSTERED INDEX [IXNC_Persons_LastName_PostCode] ON [dbo].[Persons]

    ([LastName] ASC,[PostCode] ASC,[Country] ASC,[DOB] ASC,[PersonID] ASC)

    And, last but not least, the problematic query:

    declare @LastName NVARCHAR(255), @PostCode NVARCHAR(255), @DOB DATE, @Country NVARCHAR(255)

    SELECT COUNT(1)

    FROM dbo.Persons

    WHERE LastName = @LastName

    AND PostalCode = @PostCode

    AND DOB = @DOB

    AND Country = @Country

    AND PersonID<>0

    The execution plan for this query shows that the SQL grabs the records out of the [IXNC_Persons_PostCode] index and does a bookmark lookup to get the rest of the data from the clustered index. If I use an index hint to force use of the [IXNC_Persons_LastName_PostCode] index, the query runs much faster (instantaneous versus an average of three seconds), and if I run SET STATISTICS IO ON I see that SQL needs only 8 logical reads to fulfill the query when I force the index hint versus almost 6,000 with the execution plan that SQL chooses on its own.

    All indexes are re-org'ed or rebuilt on the same schedule, and stats are up to date as well. When I run the query in SSMS, it does choose the "correct" index. I have checked the connection settings for the app that runs the query, and there is one difference. When I change the connection settings in SSMS to match the application, I still get the query to use the "correct" plan.

    Thanks in advance!

  • Bad cardinality estimations perhaps. Can you post the problematic plan?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/27/2015)


    Bad cardinality estimations perhaps. Can you post the problematic plan?

    99.x% it is this, or plan caching got you a plan for inputs that were optimal for that set but now suboptimal for your current values.

    Just look at the query plan and compare actual and estimated row counts. If they are out of whack in either direction you lose.

    Hard-coded values will get you good plan. As should OPTION (RECOMPILE) on the query. Data value skew could be in play with your data, which can force you to deal with it or lose too.

    BTW, kudos for using EXACTLY the same data types for your variables!! Many out there do not do that, and that is likely the single worst thing I see in aggregate in my performance tuning consulting. However, all nvarchar(255) is likely suboptimal to small degree. Size DOES matter. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Very surprising that SQL doesn't the index on which it can do almost a full seek.

    Do you really need the "PersonID<>0", or could you remove that? I believe that condition could force SQL to do two separate seeks of the index, once for < 0 and one for > 0.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (5/27/2015)


    Very surprising that SQL doesn't the index on which it can do almost a full seek.

    Do you really need the "PersonID<>0", or could you remove that? I believe that condition could force SQL to do two separate seeks of the index, once for < 0 and one for > 0.

    Another question to ask regarding PersonID is what are the values in that column. If the minimum value is 0 or possibly null, instead of <> 0 just use > 0.

  • One other point to consider is whether or not you're using local variables in the actual code or you're using parameters. If local variables, then you're seeing averages from the statistics and that could skew the execution plan. If you're using parameters, you might, occasionally, get bad parameter sniffing issues due to data skew or out of date statistics.

    Query tuning is fun!

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • tim_harkin (5/27/2015)


    declare @LastName NVARCHAR(255), @PostCode NVARCHAR(255), @DOB DATE, @Country NVARCHAR(255)

    SELECT COUNT(1)

    FROM dbo.Persons

    WHERE LastName = @LastName

    AND PostalCode = @PostCode

    AND DOB = @DOB

    AND Country = @Country

    AND PersonID<>0

    This might just be a typo in the OP, but are you querying on the PostalCode like you have in your query or PostCode like you have in your table. If one of your other columns is PostalCode and that's what your using in the query, your IXNC_Persons_LastName_PostCode index wouldn't be covering.

  • Thank you all for the responses. It does indeed look like a parameter sniffing issue. My question was why did it keep popping up. Basically if the last name was Smith or Patel, the inefficient plan was actually better.

    GilaMonster - regarding cardinality estimates, they are pretty accurate. Estimates are up to 6 when the actual number should be 1. As far as posting a plan, hard to do while also obscuring the data.

    SQLGuru and ScottPletcher - as far as your ideas, the app team has pretty much refused to change the query. So, OPTION (RECOMPILE) or changing the PersonID<>0 are not options for me. Political issue, not a technical one. Also, I was leery of adding OPTION (RECOMPILE) for something that runs that often.

    Ed Wagner - yup, typo by me....

    So, it DOES look like I have a fix. We created a new index, that has the same elements as the [IXNC_Persons_LastName_PostCode] index, but with PostCode as the leading attribute. SQL was favoring PostCode anyway, now it can get all the info it needs from one index seek.

    Looking at the stats, PostCode is almost as selective as LastName (within 2% looking at density). My theory, which I am looking to test, is that the similar selectivity combined with the much smaller data type influenced SQL to choose the original "bad" plan.

  • Also, I was leery of adding OPTION (RECOMPILE) for something that runs that often.

    This is SIGNIFICANTLY misguided, IMHO! I will jump through EXTRAORDINARY hoops these days to trade CPU ticks for query perf (especially IO), and that is exactly what you do with OPTION (RECOMPILE) or other means to avoid getting suboptimal plans when you have either data value skew or widely-varying filtering (think @startdate and @enddate on a report sproc for example). I can count on one hand the number of times I have seen compilation locking issues or CPU issues related to compilations in the wild in 20 years of SQL Server consulting but can't possibly count the times where OPTION (RECOMPILE) et al made performance WAY better AND consistent.

    BTW, one of those times was when an admin set Cost Threshhold for Parallelism to 0. Chew on that one for a minute or two. :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/28/2015)


    Also, I was leery of adding OPTION (RECOMPILE) for something that runs that often.

    This is SIGNIFICANTLY misguided, IMHO! I will jump through EXTRAORDINARY hoops these days to trade CPU ticks for query perf (especially IO), and that is exactly what you do with OPTION (RECOMPILE) or other means to avoid getting suboptimal plans when you have either data value skew or widely-varying filtering (think @startdate and @enddate on a report sproc for example). I can count on one hand the number of times I have seen compilation locking issues or CPU issues related to compilations in the wild in 20 years of SQL Server consulting but can't possibly count the times where OPTION (RECOMPILE) et al made performance WAY better AND consistent.

    BTW, one of those times was when an admin set Cost Threshhold for Parallelism to 0. Chew on that one for a minute or two. :hehe:

    Maybe he was trying to ensure good performance on Columnstore indexes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/28/2015)


    TheSQLGuru (5/28/2015)


    Also, I was leery of adding OPTION (RECOMPILE) for something that runs that often.

    This is SIGNIFICANTLY misguided, IMHO! I will jump through EXTRAORDINARY hoops these days to trade CPU ticks for query perf (especially IO), and that is exactly what you do with OPTION (RECOMPILE) or other means to avoid getting suboptimal plans when you have either data value skew or widely-varying filtering (think @startdate and @enddate on a report sproc for example). I can count on one hand the number of times I have seen compilation locking issues or CPU issues related to compilations in the wild in 20 years of SQL Server consulting but can't possibly count the times where OPTION (RECOMPILE) et al made performance WAY better AND consistent.

    BTW, one of those times was when an admin set Cost Threshhold for Parallelism to 0. Chew on that one for a minute or two. :hehe:

    Maybe he was trying to ensure good performance on Columnstore indexes.

    Interesting thought, but this was about 7 years ago or so. It was a mixed-mode server (OLTP/OLAP on same databases). I changed CTFP to 15 and CPU made a step-change drop. It was pretty cool. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/29/2015)


    Grant Fritchey (5/28/2015)


    TheSQLGuru (5/28/2015)


    Also, I was leery of adding OPTION (RECOMPILE) for something that runs that often.

    This is SIGNIFICANTLY misguided, IMHO! I will jump through EXTRAORDINARY hoops these days to trade CPU ticks for query perf (especially IO), and that is exactly what you do with OPTION (RECOMPILE) or other means to avoid getting suboptimal plans when you have either data value skew or widely-varying filtering (think @startdate and @enddate on a report sproc for example). I can count on one hand the number of times I have seen compilation locking issues or CPU issues related to compilations in the wild in 20 years of SQL Server consulting but can't possibly count the times where OPTION (RECOMPILE) et al made performance WAY better AND consistent.

    BTW, one of those times was when an admin set Cost Threshhold for Parallelism to 0. Chew on that one for a minute or two. :hehe:

    Maybe he was trying to ensure good performance on Columnstore indexes.

    Interesting thought, but this was about 7 years ago or so. It was a mixed-mode server (OLTP/OLAP on same databases). I changed CTFP to 15 and CPU made a step-change drop. It was pretty cool. 😎

    HA! Not surprised. I was (mostly) kidding.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply