June 3, 2009 at 5:17 am
i have inherited a large DB 1.5 tb on sql server 2000 (upgrade is planned) and most of the tables (320) have no PK or clustered index.
i am going to recommended adding a clustered index to every table with more than 100 data pages.
can anyone think of a reason for not doing this?
What about the lacking PKs. many of the tables have an identity column are there any reasons why setting this to be the PK would be a mistake?
What are the performance considerations for each of these possibilitites.
Thanks for all help,
Jules.
June 3, 2009 at 5:32 am
Hi Jules
The only reason is the size, if the DB which you are mentioning is not been used and stored there for archive then you don need a index ( arguing) , Else i don't see any point for not having a index on tables 🙂
June 3, 2009 at 5:46 am
they are used mainly for inserts.
are inserts into million row tables slower on heaps that tables with CI.
I can think of reasons for both answers.
How much space would a clustered index add.
can a zero fragmented table with a CI take up less room that a 100% fragmented heap?
thanks,
Jules
June 3, 2009 at 6:37 am
June 3, 2009 at 7:26 am
In my opinion, yes. Every table should have a clustered index. There are a few disadvantages to having heaps. That said, a badly chosen clustered index is worse than no clustered index at all, so do some research and some careful investigation before picking a clustering key.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 3, 2009 at 7:49 am
June 3, 2009 at 8:07 am
Jules Bonnot (6/3/2009)
Hi would you mind expanding your post a little.What are the disadvantages of having heaps.
Forwarding pointers and the resulting increase in IO. Only an issue when there are updates.
Sometimes space is not released after a delete.
You mentioned fragmentation earlier. Logical fragmentation's a meaningless concept on a heap. Logical fragmentation comes about when the order of the pages on disk is not the same as the index order. Since heaps have no order they cannot have logical fragmentation.
and what size increase can iexpect from creating the clustered index.
Depends on the table and the clustering key chosen.
Are inserts on heaps faster?
Depends. A good clustered index can make your inserts faster. A bad clustered index will make them slower
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 3, 2009 at 8:09 am
Inserts on heaps can be faster. Not often, but they can be.
The only real advantage to heaps is that it can avoid "hot spots" on the drives, if you have a multi-drive array and are inserting very constantly. The reason it can avoid that is because it doesn't have to insert "at the end of the table", it can insert anywhere in it. Note that this is very, very rarely an advantage over a table with a well-designed clustered index.
A badly designed clustered index can make this a LOT worse, as it can result in many/all of the inserts resulting in page splits and index fragmentation.
Adding a clustered index to a table doesn't add much (if anything) to the size on the disk. Non-clustered indexes do, but the clustered index is the table data, so it really doesn't have much of an impact that way.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 3, 2009 at 8:48 am
Thanks to all who are helping me with this.
heres the table def
---------------------------
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table_1](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[col2] [int] NOT NULL,
[col3] [int] NOT NULL,
[col4] [int] NULL,
[col5] [datetime] NOT NULL,
[col6] [datetime] NOT NULL,
[col7] [int] NOT NULL,
[col8] [float] NULL,
[col9] [int] NOT NULL,
[col10] [float] NOT NULL,
[col11] [float] NULL,
[col12] [float] NOT NULL,
[col13] [float] NOT NULL,
[col14] [int] NOT NULL,
[col15] [int] NOT NULL CONSTRAINT [DF_accounting_p4p_state] DEFAULT (0),
[col16] [datetime] NOT NULL CONSTRAINT [DF_accounting_create_date] DEFAULT (getdate()),
[col17] [float] NULL,
[col18] [int] NOT NULL CONSTRAINT [DF_accounting_suba_id] DEFAULT (0),
[col19] [int] NOT NULL CONSTRAINT [DF_accounting_user_id] DEFAULT (0),
[col20] [smallmoney] NULL,
[col21] [tinyint] NULL CONSTRAINT [DF_accounting_suba_comm_state] DEFAULT (0),
[col22] [smalldatetime] NULL,
[col23] [int] NULL,
[col24] [tinyint] NULL,
CONSTRAINT [PK_accounting] PRIMARY KEY NONCLUSTERED
(
[acco_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[accounting_old] WITH NOCHECK ADD CONSTRAINT [FK_accounting_partner] FOREIGN KEY([partner_id])
REFERENCES [dbo].[partner] ([partner_id])
GO
ALTER TABLE [dbo].[accounting_old] CHECK CONSTRAINT [FK_accounting_partner]
-----------------------------------------------------------------------------------------------
It contains 77 million rows and is 10 gb in size.
'You mentioned fragmentation earlier. Logical fragmentation's a meaningless concept on a heap'
Gail- i mentioned the 'fragmention' of the heap not logical fragmentation. Part of the reason
i want to add a clustered index is so i defrag the table. As you mention a heap cant be defraged
so this is my only option to keep the data contiguous..
Does this make sense?
I plan to add the clustered index on the identity column.
What effect will this have on the inserts?
Will selecting the identity col for a clustered index be bettter for insert performance than a non-incremental column(s)?
Why?
GSquared - Are hotspots still an issue in SQl 2000? I plan to put the index on the identity column so if hotspots are a problem this could get nasty.
The data is on a SAN rather than an array. Does this make a difference?
What would an example of a badly designed clustered index be? Should i put the index on the identity co? (i dont know what range queries are being run)
Given that the table is 10 gb what size would you expect to be if i put a clustered index on the identity colmn?
All help most appreciated.
Jules
June 3, 2009 at 9:20 am
Jules Bonnot (6/3/2009)
GSquared - Are hotspots still an issue in SQl 2000? I plan to put the index on the identity column so if hotspots are a problem this could get nasty.
Yes, but much less that in 6.5 (which is where most of the horror stories of hotspots come from). Someone did a test a while back and found that you need a few thousand inserts per second to show any degradation.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 3, 2009 at 9:35 am
Does the table really have columns with those names? I hope that's just you annonymizing it.
As mentioned, hot spots are only very, very rarely going to be an issue. And even more rarely going to be something you can alleviate by inserting into a heap instead of a clustered index.
An example of a bad clustered index is one on a GUID column that uses newid() instead of newsequentialid(), or one on a varchar column with people's names in it or something like that.
What you're talking about doing, making an identity column into the leading edge of the clustered index, is very normal, and almost always works quite well.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 3, 2009 at 9:41 am
GSquared (6/3/2009)
An example of a bad clustered index is one on a GUID column that uses newid() instead of newsequentialid(), or one on a varchar column with people's names in it or something like that.
Or one on several columns with a total size of a couple hundred bytes.
Or a column that's changing frequently
Does the table really have columns with those names? I hope that's just you annonymizing it.
I suspect the latter. Look at the definition of the PK. It references a column with a 'real' name.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 4, 2009 at 2:27 am
Thanks for your replies.
i think i will write a little test and see how the heap with identiy performs against the CI with identity when under heavy inserts. I will post the script once i have done it.
Regarding the issue of primary keys. Obviously a PK is a form of constraint and must therefore involve involve some overhead to check.
Anyone had any bad experiences adding primary keys? What kind of impact can i expect?
Whats the point of adding a PK to an identity column?
Thanks your help.
Jules
June 4, 2009 at 7:13 am
I think the only time I've had a table be faster because of a lack of PK has been in bulk-import situations. It is faster to import multi-million-row text files into a heap table with no PK, and then add an ID-PK to the table. That's the only case I can think of where I've seen a lack of PK speed things up at all. I've also seen that cause rows to simply "disappear" from the import, so while it may be slightly faster, I've also seen it lose data, so I don't use that technique. (That was in SQL 2005 and in SQL 2000. Haven't tried it in 2008.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 25, 2009 at 12:44 pm
Did you ever complete your test? I'm curious to know the result. My guess is that the performance would be about the same. DRI (declarative referential integrity, i.e. primary and foreign key constraints) can definitely cause some performance hits I would think. If the data is well controlled, its probably a good idea to drop these constraints entirely, but if there is a risk of corrupt data being added somehow, then you shouldn't drop it. You might want to performance test this, if you gain little or nothing, then keep the DRI. I don't know what the point is in having a primary key constraint on an identity column though (what would you gain from this?). Perhaps SQL Server is smart enough to NOT add any extra overhead when you do something redundant like that though, so maybe no harm done.
Jules Bonnot (6/4/2009)
Thanks for your replies.i think i will write a little test and see how the heap with identiy performs against the CI with identity when under heavy inserts. I will post the script once i have done it.
Regarding the issue of primary keys. Obviously a PK is a form of constraint and must therefore involve involve some overhead to check.
Anyone had any bad experiences adding primary keys? What kind of impact can i expect?
Whats the point of adding a PK to an identity column?
Thanks your help.
Jules
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply