April 22, 2008 at 1:23 pm
Hey all,
I'm taking a decent performance hit due to having an index scan instead of seek ... my index is setup properly, and I see no reason at this point why I am not getting a seek. The only thing that is sticking out is the fact that my datetime has an implicit convert.
Anyone have a better approach to finding all dates with in the specified month that would not cause this scan to happen?
Thanks
DECLARE
@periodDATETIME
SET @period = '02/01/2006'
SELECT
col1
,col2
,col3
FROM tbl1
WHERE mydate BETWEEN DATEADD(MONTH, DATEDIFF(MONTH, 0, @period), 0) AND DATEADD(MONTH, DATEDIFF(MONTH, 0, @period) + 1, 0) - 1
ORDER BY mydate
Same results this way ...
DECLARE
@periodDATETIME
,@DateStartDATETIME
,@DateEndDATETIME
SET @period = '02/01/2006'
SELECT @DateStart = CONVERT(VARCHAR(25),DATEADD(dd,-(DAY(@period)-1),@period),101)
SELECT @DateEnd = CONVERT(VARCHAR(25),DATEADD(dd,-(DAY(DATEADD(mm,1,@period))),DATEADD(mm,1,@period)),101)
SELECT
col1
,col2
,col3
FROM tbl1
WHERE mydate BETWEEN @DateStart AND @DateEnd
ORDER BY mydate
April 22, 2008 at 1:45 pm
Ok, the datatype of the column was smalldatetime ... I was able to remove the implicit conversion, but still scanning instead of seeking. What are other usual suspects of this?
Thanks
April 22, 2008 at 4:29 pm
If mydate is not the Clustered Index and the total numbers of rows it estimates to return exceeds some value (10% maybe?), then it will scan instead of seek.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 23, 2008 at 6:45 am
Once you've got the datatype right, it comes down to indexing. I modified your first example to run against AdventureWorks, just so I had something to test:
DECLARE @period DATETIME
SET @period = '02/01/2006'
SELECT p.PurchaseOrderId
,p.PurchaseOrderDetailId
,p.ProductId
FROM [Purchasing].[PurchaseOrderDetail] p
WHERE p.[DueDate] BETWEEN DATEADD(MONTH, DATEDIFF(MONTH, 0, @period), 0)
AND DATEADD(MONTH, DATEDIFF(MONTH, 0, @period) + 1, 0)
- 1
ORDER BY p.[DueDate]
I dropped all the indexes on the table and, of course, got a table scan when I ran the query. I then added an nonclustered index to the DueDate column and again got a table scan with 1 scan & 9 reads against the table. I changed that index to clustered and got a nice clean seek with a 1 scan & 2 reads. I then dropped that index & created a cluster on the PurchaseOrderId & PurchaseOrderDetailId, clustered index scan with 11 reads. I added another index to DueDate with the cluster still intact. Scan again with 11 reads. Then I tried to get a covering index and added ProductId & DueDate to the clustered index. Scan. Then I changed the column order, putting DueDate first. Seek, 2 reads. Finally, I recreated that same index but nonclustered. Again, Seek with 2 reads. I also tried a nonclustered index using an INCLUDE to add the other columns to the index, again making it covering. Seek with 2 reads.
That's against a pretty limited set of data, 1000 rows, so different statistics might provide different results, but there's nothing inherently wrong with the query. You just have to get the indexes right.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 23, 2008 at 10:24 pm
I would bet on one of two things:
a) the stats are old, or inaccurate for the date range
b) it's more efficient to just use a scan - the selectivity of the index isn't sufficient to warrant a seek.
What's the index definition?
April 24, 2008 at 5:50 am
rbarryyoung (4/22/2008)
If mydate is not the Clustered Index and the total numbers of rows it estimates to return exceeds some value (10% maybe?), then it will scan instead of seek.
Really? Is there something published to back this up?
April 24, 2008 at 5:54 am
matt stockham (4/23/2008)
I would bet on one of two things:a) the stats are old, or inaccurate for the date range
b) it's more efficient to just use a scan - the selectivity of the index isn't sufficient to warrant a seek.
What's the index definition?
- I updated the stats and free'd the cache prior to testing
- How can it be more efficient to scan instead of seek on a table with 7+ million records?
- The index matches the select statement
April 24, 2008 at 5:56 am
Grant, is your summary that the columns need to be included into the clustered index?
April 24, 2008 at 6:11 am
I thought I had seen that 10% number too. Here's where it's referenced.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2008 at 6:16 am
No, based on the data I tested with, 1000 rows not 7 million, the date column alone didn't supply a selective enough non-clustered index for the optimizer to choose it for the query. When I combined it with other columns, clustered or not, in the include list or as part of the index, the optimizer was able to identify it as a useful index. So, I'd look at your indexes and see if you could add a column or two to the include list, but keep the date as the leading edge.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2008 at 6:19 am
Well, I'm tweaking this proc in 2005 ... but I'm gonna have to push it into a 2000 environment 🙁 thus, includes are not a possibility ...
April 24, 2008 at 6:33 am
Bummer. Then you'll need to create a compound index, at least based on my tests. The query is basically OK (since you fixed the implicit data conversion, those things will get you), so you just need to ensure you've got the index it can use.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2008 at 9:20 am
That's the thing though ... it is using the index, but just scanning instead of seeking. The index matches the select columns of the query and even in the execution plan it shows the output list matching my index identically ... so why scan instead of seek?
April 24, 2008 at 9:38 am
Hmmm, it's a covering index? Change the order of the columns then. You mentioned earlier that the statistics were up to date and the index wasn't fragmented, right?
You're getting a scan because scanning an index is usually more effecient than scanning a table (fewer columns=fewer pages=fewer reads=more efficient), but the index isn't selective enough, at least apparently, to the optimizer for it to do a nice clean seek. Based on my tests, you should be able to get a seek though.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2008 at 9:38 am
Your index is on mydate, col1, col2, col3 - in that order?
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply