May 1, 2010 at 2:16 pm
I have started working on indexes lately and was studying indepth about them in Sql Server.
I created a a nonclustered index on a table Which had just two columns one "ObservationTime" (DateTime Type) and other PointValue(Integer Type).
I inserted some .5 million rows from 1st January 2009 to 1st January 2010(I inserted data for every 1 minute starting from 1st January 2009)
with some random Point Values
After that I Created index on ObsDateTime and ran a query like below
Select PointValue From IndexTestTable Where ObservationTime Between '5/1/2009' and '5/2/2009'.
This returns me around 1440 rows but I was suprised to see Table scan in Query plan. So Indexes were not reffered at all.
I was expecting Index scan here. Can anyone guide me what's actually happening here.
There is one thing to note here :when I replicate the same thing in oracle, indexe on the dateTime field is reffered.
Also , please post or refer some book or online material for studying in depth about indexes in sql server.
Thanks and regards,
Kunal Uppal
May 1, 2010 at 2:35 pm
the table contains 500,000 rows? If your distribution of data was even as your description suggests it would be then I would expect your query to return about 40,000 rows or 8% of the data. It only returns 1440 rows so the distribution is obviously not even. Update the stats and see what happens.
Also the column you are retrieving is not part of the index so bookmark lookups would be required if the index was used, SQL may have decided a table scan was the better option, especially as the size of the index is going to be comparable to the size of the table.
---------------------------------------------------------------------
May 1, 2010 at 2:44 pm
Ahh, language settings, is that query 1st and 2nd May or 5th Jan to 5th Feb?
Former I presume. Same arguments still apply.
---------------------------------------------------------------------
May 1, 2010 at 2:56 pm
It's from 1st May to 2nd May.
and even if I add some more columns and include it in my Select list, Sql Server still goes for Table Scan
May 1, 2010 at 3:01 pm
The search criteria on the where clause is used by SQL to determine optimum plan, not the select list.
Post your create table script, sample values and the execution plan.
---------------------------------------------------------------------
May 1, 2010 at 4:14 pm
george sibbald (5/1/2010)
The search criteria on the where clause is used by SQL to determine optimum plan, not the select list.Post your create table script, sample values and the execution plan.
Same here...
If there are really only 2 columns in the table then the index has to be used somehow and assuming the stats are not out of whak, the optimizer should figure out that a seek is the fastest here.
Just a quick suggestion. Change the index you just created to clustered index and then make sure you are using the datetime datatype in the where condition (either use declared variableor explicit convert). the latter suggestion is to rule out convert implicit which will always force a scan of the table or index.
May 1, 2010 at 7:23 pm
uppal.kunal (5/1/2010)
I have started working on indexes lately and was studying indepth about them in Sql Server.I created a a nonclustered index on a table Which had just two columns one "ObservationTime" (DateTime Type) and other PointValue(Integer Type).
I inserted some .5 million rows from 1st January 2009 to 1st January 2010(I inserted data for every 1 minute starting from 1st January 2009)
with some random Point Values
After that I Created index on ObsDateTime and ran a query like below
Select PointValue From IndexTestTable Where ObservationTime Between '5/1/2009' and '5/2/2009'.
This returns me around 1440 rows but I was suprised to see Table scan in Query plan. So Indexes were not reffered at all.
I was expecting Index scan here. Can anyone guide me what's actually happening here.
There is one thing to note here :when I replicate the same thing in oracle, indexe on the dateTime field is reffered.
Also , please post or refer some book or online material for studying in depth about indexes in sql server.
Thanks and regards,
Kunal Uppal
Please post the code you used to generate the .5 million rows so I can verify it, please.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 1, 2010 at 7:54 pm
I created my own test table instead of waiting... here's the code (which you should have posted to begin with, thanks 😉 )...
DROP TABLE #MyHead;
GO
WITH
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E8)
SELECT TOP (DATEDIFF(mi,'20090101','20100101'))
ISNULL(N,0) AS N,
ISNULL(DATEADD(mi, N-1,'20090101'),0) AS ObservationTime, --ISNULL makes a NOT NULL column
ABS(CHECKSUM(NEWID())) AS PointValue
INTO #MyHead
FROM cteTally
;
In order to get an INDEX SEEK you frequently need for the data in both the criteria and the SELECT clause to be "covered" by an index. You can accomplish this in one of three ways on the table above...
1. Use a CLUSTERED index on the ObservationTime column or...
2. Use a NON CLUSTERED index on the ObservationTime with an INCLUDE on the PointValue or...
3. Use a NON CLUSTERED index on the ObservationTime and the PointValue column.
The reason why, in this particular case that it doesn't choose to do an index seek is because the optimizer probably determined that a table scan is cheaper to do than an index seek and then 1400+ row lookups based on that seek. And, I believe it's correct in that assumption.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2010 at 1:27 am
Sorry for not tracking this on weekends.!
Thanks to all for putting views but I think I am still far from an answer.
In order to get an INDEX SEEK you frequently need for the data in both the criteria and the SELECT clause to be "covered" by an index. You can accomplish this in one of three ways on the table above...
1. Use a CLUSTERED index on the ObservationTime column or...
2. Use a NON CLUSTERED index on the ObservationTime with an INCLUDE on the PointValue or...
3. Use a NON CLUSTERED index on the ObservationTime and the PointValue column.
The reason why, in this particular case that it doesn't choose to do an index seek is because the optimizer probably determined that a table scan is cheaper to do than an index seek and then 1400+ row lookups based on that seek. And, I believe it's correct in that assumption.
Thanks Jeff for detailed description to how to make it work but frankly I am looking for "Why Index didn't work here?" rather than "how to make it work?"
Infact while working on this table , I did created another column and created composite index and then index were referred(similar to what you said in point 3). So before posting this questions I was quite aware to how to make it work but was looking for an answer for "Why it didn't work".
As you said "optimizer probably determined that a table scan is cheaper to do than an index seek"..
This was already clear when I saw plan but then on what grounds optimizer decided to go for Table scan is not clear to me.
Same query in oracle does refer to index.. so there is something which Sql server does differently
I hope I will get close to "Why" part this time
Regards
Kunal Uppal
May 3, 2010 at 2:03 am
uppal.kunal (5/3/2010)
Thanks Jeff for detailed description to how to make it work but frankly I am looking for "Why Index didn't work here?" rather than "how to make it work?"
Frankly, I think you missed Jeff's point. 😛
Because the index does not include the PointValue column, SQL Server would have to locate the records matching ObservationTime in the non-clustered index, then look up the value of PointValue in the table for each match that it found. This involves I/O which is likely to be random in nature (unless there happens to be a correlation between ObservationTime and PointValue). The cost of these extra I/Os quickly adds up.
Say the table is stored on 500 pages (of 8KB each). SQL Server can choose to scan the table (incurring 500 reads) or seek down the index and then look up Point Value in the table, using the row locator stored in the index.
If the table is clustered, with a clustered index depth of 2, SQL Server incurs three extra I/Os just to perform the look up - and remember this is per row, so it does not require many rows to match to exceed the 500-read cost of the table scan. Say 200 rows match the criteria - it costs 600 reads just to perform the look ups. Therefore, the scan is cheaper. Even if the table is a heap, less than 500 matching rows would be required to make the scan cheaper.
Another consideration is that without a presentation ORDER BY clause, SQL Server might perform the table scan in allocation order. This can result in large sequential I/O that can make effective use of read-ahead. The small, random I/Os performed by the index seek plus look up make for a sharp contrast.
All that said, the optimal index for this query is that referenced as #2 by Jeff.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 3, 2010 at 6:50 am
uppal.kunal (5/3/2010)
Same query in oracle does refer to index.. so there is something which Sql server does differently
Yes, it IS different. It's been a long time since I've worked with Oracle but I believe the reason for this difference is that Oracle defaults to a "Rule Based" execution plan and SQL Server uses only "Cost Based" execution plans.
Which one is better in this case? I don't know... I don't have Oracle available to me and I'd probably not admit it to anyone if I did have it available because that would mean I'd have to touch Oracle again. 😛
I hope that answers your question.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2010 at 7:04 am
One other option, since there's no sample code and no execution plan, this is speculation. You showed the values passed as being strings '9-1-2009', etc. It's possible that the implicit data conversion needed to change the string to a date prevented the index from being used. This is not always the case, but it is often enough.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply