December 5, 2008 at 9:17 am
Am I losing the plot or....
Is there a row count threshold that has to be passed before SQL will start using indexes clustered or non-clustered, rather than a table scan ?
e.g if the table has say 100 rows will it still use table scans despite there being indexes present ?
thanks simon
December 5, 2008 at 1:03 pm
You have to ask the Query Optimizer 😛
For small tables (like 100 rows), it may decide it's quicker & cheaper to scan the whole table
For big tables, depending on the query, it could use Indexes or Table Scan
Answer is = it depends
December 5, 2008 at 1:36 pm
Depends on the table, depends on the query, depends on the index.
e.g if the table has say 100 rows will it still use table scans despite there being indexes present ?
If all 100 rows are in one data page, which is faster, to scan the entire table (1 page) or to use the index to seek (minimum 2 pages)?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 5, 2008 at 1:39 pm
Just to add to what Jerry already said...
If you are using columns in your Select statement that are not part of your index in one way or another and the optimizer needs to do a bookmark lookup, with small tables like that one the table scan would be faster.
I.E. you got the following
Create Table #myUsers (
username varchar(25) Primary Key,
FName Varchar(30),
LName Varchar(50)
)
And the only index you have is a PK on Username but your query is something like
Select FName, LName
FROM #MyUsers
WHERE username = 'bgates'
a table scan would be cheaper than the lookup that would be needed to find it in the index and then use the bookmark to get the rest of the data from the row, depending on the number of rows and 20 other factors.
there is no magic number when the optimizer will prefer one way vs. another, it takes into account all manner of things including you stats and event he available memory and/or CPU cycles. If it can do the operation 2 ways, one which will take more memory and one which will take more CPU cycles and there is less contention for cpu than memory, it will use the plan that takes up more CPU and less memory. The next tiem it runs the opposite may happen.
As Jerry said, It Depends and as always YMMV.
-Luke.
December 8, 2008 at 2:25 am
thank you all very much
food for thought ...
best wishes
~simon
December 9, 2008 at 6:38 am
I thought if a table was less than 8 pages, it won't ever use an index...
December 9, 2008 at 7:47 am
lookup the INDEX= hint for the FROM clause in BOL. You can force a query to use a particular index. Now you can compare the query cost (show actual execution plan) and the IOs required (set statistics IO on) for each flavor of the query:
select *
from mytable WITH (INDEX=0) --force table scan, which is probably unnecessary for small tables
select *
from mytable WITH (INDEX=myNCindex) --force index usage
Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 9, 2008 at 7:51 am
Mike Durnwald (12/9/2008)
I thought if a table was less than 8 pages, it won't ever use an index...
I had heard 1000 rows but it would appear by the knowledgable answers in this thread thats wrong ! 😀
December 9, 2008 at 7:58 am
The optimizer is cost-based. An index seek/bookmark lookup of a given depth takes X cost per row. A table (or clustered-index range) scan takes Y cost per page. Based on estimated rows retrieved it is pretty simple math to determine in this case which cost is less. Note that X does NOT equal Y here. For those inquisitive readers, Joe Chang has some amazing details available online from previous work he has done.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 9, 2008 at 8:00 am
TheSQLGuru (12/9/2008)
The optimizer is cost-based. An index seek/bookmark lookup of a given depth takes X cost per row. A table (or clustered-index range) scan takes Y cost per page. Based on estimated rows retrieved it is pretty simple math to determine in this case which cost is less. Note that X does NOT equal Y here. For those inquisitive readers, Joe Chang has some amazing details available online from previous work he has done.
thank you very much 🙂
simon
December 9, 2008 at 10:48 am
TheSQLGuru (12/9/2008)
Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.
Depends on the table. I've had it switch to a cluster/table scan with under 1% of the rows queried. With a non-covering index, of course.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 9, 2008 at 12:24 pm
GilaMonster (12/9/2008)
TheSQLGuru (12/9/2008)
Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.Depends on the table. I've had it switch to a cluster/table scan with under 1% of the rows queried. With a non-covering index, of course.
1) I wonder if that was actual rows returned or estimated?
2) I also wonder about the size of the table, like 100 rows or 100M rows and did it have bloated data size?
3) I should have specifically excluded covering-index situations. 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 9, 2008 at 1:19 pm
TheSQLGuru (12/9/2008)
1) I wonder if that was actual rows returned or estimated?
Estimated. The optimiser works with estimated row counts and that's what it uses for its costings and plan. It doesn't know, at optimisation time, what the actual row count is going to be and hence can't cost and optimise based on the actual row count.
That said, in test cases that I've investigated with the stats up to date the two are the same
2) I also wonder about the size of the table, like 100 rows or 100M rows and did it have bloated data size?
Small, large, doesn't seem to matter. Though the larger the table, the smaller the break point seems to be. Not a large shift in the % though.
Some setup first
CREATE TABLE [dbo].[SeekOrScan](
[ID] [int] IDENTITY(1,1) NOT NULL primary key,
SomeNumber int,
padding char(100)
) ON [PRIMARY]
CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber)
First test - 1000 rows
insert into [SeekOrScan] (SomeNumber)
select top 1000 number
from master..spt_values
where name is null
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 0 and 100 -- 10% of table
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 0 and 50 -- 5% of table
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 0 and 5 -- 0.6% of table
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 0 and 4 -- 0.5% of table
-- nonclustered index seek
Break point was around 0.5% of the total rows.
1 000 000 rows
truncate table SeekOrScan
insert into [SeekOrScan] (SomeNumber)
select top 1000000 0
from master..spt_values a cross join master..spt_values b
where a.name is null and b.name is null
update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between
GO
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 1 and 100000 -- 10% of table
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 1 and 50000 -- 5% of table
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 1 and 10000 -- 1%
-- Clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 1 and 5000 -- 0.5% of table
-- clustered index scan
dbcc freeproccache
select * from [SeekOrScan]
where somenumber between 1 and 3000 -- 0.3% of table
-- nonclustered index seek
Breakpoint here was between 0.3 and 0.4% of the table.
For what it's worth, I've never seen the breakpoint at 15%. I don't think I've ever seen it as high as 10%
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 9, 2008 at 2:50 pm
Thanks for the repro Gail! Clearly either my memory is faulty (wouldn't be the first time and it seems to be happening more and more since the birth of my daughter 3+ years ago :w00t:) or I am remembering metrics from a no-longer-existent version of the SQL Server.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply