Part 2 (Detailed): Query optimizer does not pick the correct index for a query that is executed from a job

  • This is the root cause of the topic "SQL Server Job performs very very slow..."

    SELECT top X itemID, name

    FROM dbo.Items -- with (index(IX_Items_catID))

    WHERE catID = 289

    The Items table has 8M rows and I have the primary key on itemID. I also have an index on catID and a more complex index having catID as the first column.

    There are about 200K rows with catID = 289

    In the TOP clause, depending on the X value, another index is picked:

    -> if X <= 63, IX_Items_catID is picked
    -> if X > 63, PK_Items (having itemID column) is picked

    For X = 63, the duration of the query is about 170.

    For X = 64, the duration of the query is about 12000.

    If I force the use of IX_Items_catID index with an index hint, the 64 version also performs in about 170.

    All the statistics are up to date. I have just refreshed them. Anyway the data in this table doen not change.

    All this happens in QA.

    ---------------

    If I put the same story in a job, things change:

    The 63 value remains the same, but different indexes are picked:

    -> if X <= 63, IX_Items_catID1 is picked (the complex index containins catID as the first column)
    -> if X > 63, PK_Items (having itemID column) is picked

    Why do queries behave different in QA than performed in a job by SQL server Agent?

    Why does it use that threashold? Because it would peform better only with the IX_Items_catID?

    Thanks!

  • I'm wondering if it has something to do with caching.

    QA will be running this query from scratch. Perhaps the SQLSERVERAGENT has an execution plan cached in some way.

    If you run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS and run the job what happens?

    Try updating the statistics on dbo.items or even rebuilding the index and trying again.

  • I have recompute the statistics and rebuilt indexes. No success.

    DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS none work

  • to be blunt the optimiser is probably picking the correct index, you probably don't provide the most efficient indexe for your query. Have you changed any settings in QA away from the defaults, as mentione din your first post these can make a difference? You can check the contents of the procedure cache to see what is happening.

    Try putting the query as a proc and see what happens. I note you don't have an order by with your top which is bad practice.

    You don't mention a clustered index, a covered index on ItemID,name,CatID should give the best performance. Making it clustered might make it faster - but not necesscarily

    I assume your tests run this query in isolation and the SARGs are not parameterised or replaced with avriables?

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Colin's right - a covered index would give the best performance here (although I think it should be on [CatID, ItemID, Name], since CatID is the column in the Search Argument). I wouldn't cluster it - there's no need, and a covered index is even faster as the leaf pages are smaller than the data pages of a clustered index would be, so you can retrieve more rows with a single IO request.

    I think part of the problem is that the likely number of matching rows (200k) is approx 2.5% of the total table, which is right on the cusp of the percentage at which the optimizer decides that a NC index becomes inefficient. There's no simple way of predicting this (at least, none that I've ever seen) and many people believe it's as high as 10% before a switch to a tablescan/clustered index scan occurs, but in reality you'll find it is much lower. It does depend on a number of different factors.

    You'll probably find that if you don't do a TOP at all, then the NC index will be used each time. Look for a value of CatID that returns 250k rows and I wouldn't be quite so sure. At 400k rows I'd put good money on a clustered index scan.

    By adding the TOP clause, you're adding a requirement to do yet more work, and this could just be the final little nudge that the optimizer needs to cause it to flip the plan to an index scan once the number of rows to be sorted and retrieved exceed 63.

    By putting a covering index on, (and by including an ORDER BY ), all the records will be in the correct sequence, efficiently packaged into index leaf pages, and there should be minimal overhead from using it.

  • actually secondary indexes are even more selective, typically 95% or better. You can improve matters with a covered index ( we'll agree to disagree about column order !! ). It's all to do with the number of pages, i/o's and locks that are required at which point the optimiser makes it's choice. I've found that an inefficient ( in terms of selectivity ) clustered index is more likely to be used that a non clustered. It's a case of test and test and test again.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Colin - I'm intrigued about your comment on the subject of column order. Since the only SARG column is on CatID, if this wasn't the leading column in the index then the optimizer would have to scan all 8 million index rows, and then aggregate the ones that matched, before it could access the Name and ItemID columns.

    By putting CatID first, it can do an index seek on the first occurrence of CatID, then scan the subsequent 200K rows, reading the Name and ItemID values as it does so, until the CatID changes to a new value at which point it knows to stop.

  • Well originally covered indexes could be fussy about column order in the index matching column order in the select. It shouldn't make any difference and I've to date not found any in the search column being first or last. I still tend to make covered indexes match select order. If you're in the uK Kimberley Tripp is giving an advanced index cource in Reading next month.

    I usually test the various index permutations and it depends upon how many other queries the index may have to satisfy as to my final column choice and order.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Can't argue with the "test, test, test" doctrine. I've put a request in to my line manager to attend the Reading seminar, too!

  • Cool! if I get to attend, which is unlikely as I think I'll be on a new contract, I'll do my best to find you! I attended the one day KImberley Tripp event and Kalen Delaney event which covered lots of index stuff.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • At the risk of flogging a point to death here, but I do think it's of interest, I've adopted the "test" approach and come up with some results. See you in Reading!

    Table (14,305,700 rows):

    ======

    CREATE TABLE [dbo].[live_traffic_incident](

    [row_id] [int] IDENTITY(1,1) NOT NULL,

    [incident_number] [bigint] NOT NULL,

    [version_number] [smallint] NOT NULL,

    [inserted_date_time] [datetime] NOT NULL,

    [incident_recorded] [datetime] NOT NULL,

    [incident_start] [datetime] NOT NULL,

    [incident_end] [datetime] NOT NULL,

    [priority] [tinyint] NOT NULL,

    [on_restricted_fvd] [bit] NOT NULL,

    [event_code] [smallint] NOT NULL,

    [ltn10_primary] [int] NOT NULL,

    [ltn10_secondary] [int] NOT NULL,

    [ltn10_direction] [smallint] NOT NULL,

    [ltn10_extent] [tinyint] NOT NULL,

    [multigroup_type] [float] NOT NULL,

    [multigroup_data] [smallint] NOT NULL,

    [source] [tinyint] NOT NULL,

    [server] [varchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [extent_miles] [float] NULL,

    [delay_type] [int] NULL,

    [delay] [int] NULL,

    [p_link_index] [int] NULL ,

    [s_link_index] [int] NULL ,

    CONSTRAINT [PK_live_traffic_incident] PRIMARY KEY CLUSTERED

    (

    [row_id] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    Index:

    ======

    ix_event_code non-clustered on [event_code]

    Data: (Apologies for the formatting!)

    =====

    EventCode % of Total Rows

    ========= ===============

    212 0.100134911259148

    211 0.114968159544797

    1031 0.148094815353321

    6460.24144921255164

    240.30578021348134

    3430.308772027932922

    2140.313867898809565

    5050.37359234431031

    4050.582012764142964

    5190.590988207497711

    5200.689620221310387

    2400.847207756348868

    2010.869247922156903

    18040.912083994491706

    5151.36530194258233

    702.25808593777306

    742.2833695659772

    4012.92720384182529

    5142.94621025185765

    733.25405957066065

    5173.41469484191616

    723.97124223211727

    715.60260595426998

    7356.34579922688159

    5187.37394884556505

    50022.4500653585634

    70828.3434854638361

    Query:

    ======

    select top 100 incident_number, version_number

    from live_traffic_incident

    where event_code = 405 /* 0.58% of total rows match */

    order by incident_number

    The above query scans the clustered index, even though only 0.58 of the total rows match - the overhead of the resulting bookmark lookups from the NC index outweighs the cost of simply scanning the entire clustered index.

    Modifying the SARG slightly to a more selective value, we can finally get the optimizer to switch:

    select top 100 incident_number, version_number

    from live_traffic_incident

    where event_code = 505 /* 0.37% of total rows match */

    order by incident_number

    This now uses an index SEEK on ix_event_code.

    If I create a COVERING index on (incident_number, version_number, event_code) then there is NO value of event_code that uses an index seek, even the most selective in the table. It uses a clustered index scan every time.

    Adding a further covering index on (event_code, incident_number, version_number) has the result that it now ALWAYS uses an index seek via that index, even with the LEAST selective value of event_code (28.3%)

    This proves (in this case at least) that the SARG column needs to be the leading column in the index before the optimizer can use it.

  • yeah sorry I forgot the first column of my covered index was very selective, if the first column isn't selective enough then you're absolutely correct, the optimiser will ignore it. You can improve this by making the covered index clustered, selectivity is lower for a clustered index.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply