December 30, 2008 at 7:13 am
I have a large table of 55 million rows. When I do a simple select based on the primary key, SQL uses a table scan for 1 value (result 422,000 rows, 47 seconds), and an index seek for another (result 95,000 rows, 1 second). Does SQL think that a table scan is better for the larger result set ? If I force the index on the larger result, it only takes 1 second.
Is there anything I can do to get SQL to use the index ?
Here're the specs:
CREATE TABLE [dbo].[CallMaster](
[Campaign_ID_FK] [uniqueidentifier] NOT NULL,
[Customer_ID] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[NoCall] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL CONSTRAINT [DF_NoCall] DEFAULT ('N'),
[CallAfter] [datetime] NOT NULL CONSTRAINT [DF_CallMaster_CallAfter] DEFAULT (getdate()),
[LastResult] [uniqueidentifier] NULL,
[LastCount] [int] NOT NULL CONSTRAINT [DF_CallMaster_LastCount] DEFAULT (0),
[LastCalled] [datetime] NOT NULL CONSTRAINT [DF_CallMaster_LastCalled] DEFAULT (getdate()),
[Attempts] [int] NOT NULL CONSTRAINT [DF_CallMaster_Attempts] DEFAULT (0),
[TimeZone] [char](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Name] [char](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Version] [timestamp] NOT NULL,
[LastVerified] [datetime] NULL,
[DialingLaws] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Callback] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ReserveCSR] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TargetCSR] [varchar](38) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[InCache] [datetime] NULL,
CONSTRAINT [PK_CallMaster] PRIMARY KEY NONCLUSTERED
(
[Campaign_ID_FK] ASC,
[Customer_ID] ASC
) ON [INDEXES]
) ON [PRIMARY]
GO
CREATE STATISTICS [Statistic_DialingLaws] ON [dbo].[CallMaster]([DialingLaws])
SELECT TOP 1000 * FROM ASDM..CallMaster WITH(NOLOCK)
WHERE Campaign_ID_FK = '347E4289-2166-4038-B249-3977BC7A5232' -- 47 seconds, 422,000 rows
SELECT TOP 1000 * FROM ASDM..CallMaster WITH(NOLOCK INDEX = PK_CallMaster)
WHERE Campaign_ID_FK = '347E4289-2166-4038-B249-3977BC7A5232'-- 1 sec, 422,000 rows
SELECT TOP 1000 * FROM ASDM..CallMaster WITH(NOLOCK)
WHERE Campaign_ID_FK = '02A47717-B3D9-43E6-9002-00EAC52A4F3B'-- 1 sec, 95,000 rows
December 30, 2008 at 7:25 am
homebrew01 (12/30/2008)
Does SQL think that a table scan is better for the larger result set ?
Yes. Because you're doing a select * , the index is not covering. That means, to use the NC index, SQL has to do 422000 seeks on the clustered index to fetch the extra rows. at 3 or 4 page reads per seek. That's a lot of IOs will result in the optimiser picking a scan over that seek.
If you can make the index covering, either by adding include columns or selecting fewer columns (or both), then you will get the index seek even at higher percentages of the table.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2008 at 8:14 am
GilaMonster (12/30/2008)
homebrew01 (12/30/2008)
Does SQL think that a table scan is better for the larger result set ?Yes. Because you're doing a select * , the index is not covering. That means, to use the NC index, SQL has to do 422000 seeks on the clustered index to fetch the extra rows. at 3 or 4 page reads per seek. That's a lot of IOs will result in the optimiser picking a scan over that seek.
If you can make the index covering, either by adding include columns or selecting fewer columns (or both), then you will get the index seek even at higher percentages of the table.
Thanks, but I don't quite get it yet. Sorry for the "newbie" questions, but why does select * affect the use of an index ? Without using the index, isn't SQL doing 55 million seeks, which is even worse ?
Also, there is no clustered index on most of the tables in this application.
If I specifiy certain columns, the execution plan still shows a table scan:
SELECT top 1000 Campaign_ID_FK,Customer_ID ,DoNotCall ,CallAfter, LastResult
,Callback,ReserveCSR,TargetCSR,InCache FROM ASDM..CallMaster WITH(NOLOCK)
WHERE Campaign_ID_FK = '347E4289-2166-4038-B249-3977BC7A5232' -- 47 seconds
December 30, 2008 at 8:22 am
If you do a select * then you are asking sql to return all the columns on the table. If the Index only has the one column in the index then it needs to be able to get the rest of the information from somewhere, It does this by doing either another index look-up or by doing a table scan.
December 30, 2008 at 10:18 am
homebrew01 (12/30/2008)
Thanks, but I don't quite get it yet. Sorry for the "newbie" questions, but why does select * affect the use of an index ?
* means all columns in the table. Not all of those columns are in the index, hence if the index is used, they have to be retrieved from somewhere.
Without using the index, isn't SQL doing 55 million seeks, which is even worse ?
No. 1 scan. Find the beginning of the table and read each of the pages in it once. There will be a lot less than 55 million pages in the table, unless those are huge rows.
With the seek on the NC, each row returned has to then be retrieved from the heap 1 at a time. So SQL will potentially read the same page multiple times to get different rows off it.
Also, there is no clustered index on most of the tables in this application.
Why not? Heaps are messy and have a number of downsides.
If I specifiy certain columns, the execution plan still shows a table scan:
SELECT top 1000 Campaign_ID_FK,Customer_ID ,DoNotCall ,CallAfter, LastResult
,Callback,ReserveCSR,TargetCSR,InCache FROM ASDM..CallMaster WITH(NOLOCK)
WHERE Campaign_ID_FK = '347E4289-2166-4038-B249-3977BC7A5232' -- 47 seconds
Is that all the columns in the table? If so, that's just an expanded version of select *. The columns are still not in the index and hence still need lookups
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2008 at 12:31 pm
"isn't SQL doing 55 million seeks"
I meant to say: Isn't SQL reading 55 million rows instead of reading 422,000 rows ?
"Is that all the columns in the table?"
No, I just picked about 1/2 at random, including the PK
"Why no clustered Index ? Heaps are messy and have a number of downsides"
I wondered that too. I just started this job a few months ago and am trying to learn what's going on. (lots of performance issues for 1 thing). The main application is purchased, and there's a bunch of homegrown code in other related databases, also without clustered indexes.
Thanks for all the help !
December 30, 2008 at 12:50 pm
homebrew01 (12/30/2008)
"isn't SQL doing 55 million seeks"I meant to say: Isn't SQL reading 55 million rows instead of reading 422,000 rows ?
If it's doing a table scan, then yes - that's exactly what it's doing. It may be quicker for it to serially read through the entire table, grabbing whatever qualifies, than it would be to (for each of the 422K rows) go to the index, figure out what page it's on, go to that page, pull the data down, follow the forwarding pointers to a new location if the row has been updated (heap behavior) to pull the rest, etc....
Remember - the query engine is COST based, not TIME based. It's often true that the query with the smallest cost will complete fastest, but that's not always true.
"Is that all the columns in the table?"
No, I just picked about 1/2 at random, including the PK
if just one of the columns you picked isn't in the index, the index no longer covers the query, so the bookmark lookups (what I was describing earlier 422K times) would have to happen.
If it isn't all of the columns, you might consider adding those columns to the index, to make the index cover the query. It would have to be <16 columns, and not a huge amount of the total row, but it would make the index a LOT more attractive to this query.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 30, 2008 at 1:23 pm
homebrew01 (12/30/2008)
"isn't SQL doing 55 million seeks"I meant to say: Isn't SQL reading 55 million rows instead of reading 422,000 rows ?
Take a mid-sized computer book with a comprehensive index at the back.
Which is faster, reading the entire book cover to cover, or taking a section of the book's index, looking up each page that those index entries refer to and reading that page?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2008 at 2:06 pm
GilaMonster (12/30/2008)
homebrew01 (12/30/2008)
"isn't SQL doing 55 million seeks"I meant to say: Isn't SQL reading 55 million rows instead of reading 422,000 rows ?
Take a mid-sized computer book with a comprehensive index at the back.
Which is faster, reading the entire book cover to cover, or taking a section of the book's index, looking up each page that those index entries refer to and reading that page?
That was part of my original poorly worded question.... Why would SQL read 55 million rcds when there is an index that allows it to read only 422,000.
December 30, 2008 at 2:07 pm
GilaMonster (12/30/2008)
homebrew01 (12/30/2008)
Also, there is no clustered index on most of the tables in this application.Why not? Heaps are messy and have a number of downsides.
Is there a downside to clustered indexes ?? Such as in an Insert heavy application ?
We're having some performance issues with some of our huge tables and I wonder if clustered indexes would help. I imagine it might take a while to build them.
December 30, 2008 at 2:16 pm
homebrew01 (12/30/2008)
That was part of my original poorly worded question.... Why would SQL read 55 million rcds when there is an index that allows it to read only 422,000.
Because the index does not have the entire of the row for those 422000 rows. It only has part. Just like the index at the back of a manual only has keywords, not the entire entry. If you want the entire entry, you have to go and look up the data page. That's time consuming. It may well involve reading the same page more than once if more than one qualifying row is on it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 30, 2008 at 2:17 pm
homebrew01 (12/30/2008)
GilaMonster (12/30/2008)
homebrew01 (12/30/2008)
"isn't SQL doing 55 million seeks"I meant to say: Isn't SQL reading 55 million rows instead of reading 422,000 rows ?
Take a mid-sized computer book with a comprehensive index at the back.
Which is faster, reading the entire book cover to cover, or taking a section of the book's index, looking up each page that those index entries refer to and reading that page?
That was part of my original poorly worded question.... Why would SQL read 55 million rcds when there is an index that allows it to read only 422,000.
That's Gail's point, it is concluding that it will be faster to just read the table all of the way through, than to look each row in the index then for each row go pull down an 8K data page with the row in it (for the fields not included in the index).
Put it another way - if you had a cookbook with 100 recipes, and the index in the back only had the title of the recipes, and you wanted to know the salt content of each chicken recipe - is it easier to
a. scan through the entire cookbook for relevant recipes (1 scan), or
b. scan the index in the back, then jump to the front of the book to look up a recipe and find the salt content, then jump BACK to the back of the book for the next recipe, etc.... (index scan with lookup)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 30, 2008 at 2:27 pm
So, thanks to you guys my brain is finally starting to catch on !
I could force the query to use the index, but then I run a future risk of causing a problem if the index is modified for some reason ?
Any thoughts about adding clustered indexes to huge tables in a busy mission critical application ?
December 30, 2008 at 2:35 pm
I consider clustered indexes essential on damn near every table. About the only time a heap is better is when the data in the table is "write once, don't update much, and doesn't get accessed a whole lot" (meaning some low-speed archival/audit tables that don't get used much).
Of course - you need to choose the right clustered index scheme, or it will also blow up in your face.
Here's some "light bedtime reading" to get you started on why:
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/clusivsh.mspx
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 30, 2008 at 2:47 pm
Matt Miller (12/30/2008)
I consider clustered indexes essential on damn near every table. About the only time a heap is better is when the data in the table is "write once, don't update much, and doesn't get accessed a whole lot" (meaning some low-speed archival/audit tables that don't get used much).Of course - you need to choose the right clustered index scheme, or it will also blow up in your face.
Here's some "light bedtime reading" to get you started on why:
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/clusivsh.mspx
Thanks 🙂
I'm on SQL 2000, but can I assume 2000 and 2005 are the same in this regard ?
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply