How many rows in a table before using a clustered index?

Question

How many rows in a table before using a clustered index?

Viewing 13 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101238 More actions · Answer 1

Jeff Moden (11/3/2012)
ScottPletcher (11/2/2012)
Given that a clustered index doesn't cost you much space (just for the upper/non-leaf levels of the index), I tend to cluster log tables as well. However, unlike most people I suspect, instead of identity I use log_datetime. The keys are still inserted sequentially, and typically when you do search a log it's by date and time. (Yes, it's not inherently unique, so SQL has to add a "uniquifier".)
I take it one step further on my log tables (which are the largest tables in the system). I have two columns in the cluster index, the entry date column and the IDENTITY column. The reason why is that there's also a huge benefit from the clustered index being UNIQUE. Kimberly Tripp has an outstanding "movie" presentation on the subject. Like any good movie, you'll need to watch it more than once to pick up all the details but it's worth every minute.
http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx

I'll try to watch it when I get the chance, but I can't see why a log table would gain from a unique key: it will be read extremely rarely, and only using the clus key to specify a range.

As I understand it, SQL only gens and populates the "uniquifier" if it is required for a given key value.

To me, datetime is not unique simply because of a physical limitation of SQL Server (max 3 ms distinction in time). For certain log tables, even that might be unique. Going forward, as we change to using datetime2, with 100 ns of accuracy -- literally billions of a sec -- perhaps it will be unique (at least 99% of the time) on its own.

If I can reasonably avoid the identity overhead, I'd like to. I'll have to view the video to see if I can reasonably do that.

Besides, think how happy it will make Celko to kill all those identity columns 🙂 :-).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

ScottPletcher (11/5/2012)
I do see one potential valid use for a heap (an unclustered table): as a bulk staging table. That is, if you just need a table to quickly bulk-load data prior to processing, a heap, particularly one that is truncated before every use, could have some performance advantages vs a clus index.

Now THAT I agree with especially since it usually contains only the data you need and every move against the table would result in a clustered index scan anyway.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

scogeb SSCrazy Points: 2059 More actions · Answer 3

GSquared (11/3/2012)
scogeb (11/2/2012)
I like to think there is always an exception to every rule. Take for instance a table that ALWAYS has inserts and NEVER has any selects. Putting a clustered index on that table would be a waste.
And the table itself would be a complete waste of space. Data is only as useful as it gets used. That means "selects" somewhere along the way.

True, perhaps I shouldn't have said "NEVER". Like insurance, you never know when you'll need it. I don't know many people that say insurance is a complete waste.

So maybe after 1 billion inserts you finally need to do 1 select. Surely maintaining a clustered index for that ratio is a waste. There HAS to be a line where it's beneficial on one side and not the other.

GSquared SSC Guru Points: 260824 More actions · Answer 4

scogeb (11/5/2012)
GSquared (11/3/2012)
scogeb (11/2/2012)
I like to think there is always an exception to every rule. Take for instance a table that ALWAYS has inserts and NEVER has any selects. Putting a clustered index on that table would be a waste.
And the table itself would be a complete waste of space. Data is only as useful as it gets used. That means "selects" somewhere along the way.
True, perhaps I shouldn't have said "NEVER". Like insurance, you never know when you'll need it. I don't know many people that say insurance is a complete waste.
So maybe after 1 billion inserts you finally need to do 1 select. Surely maintaining a clustered index for that ratio is a waste. There HAS to be a line where it's beneficial on one side and not the other.

Not really. If the clustered index is designed to optimize inserts, then there is no real cost to maintaining it. It doesn't slow down inserts to force them to happen at the "end of the table", as happens with an ID column, a NewSequentialID column, or a time-date stamp column.

So you're trying to figure out where X times 0 becomes high enough to matter. "How big does X have to be before it's too big?" is what you've been asking.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Steve Jones - SSC Editor SSC Guru Points: 741471 More actions · Answer 5

GilaMonster (11/3/2012)
Stueyd (11/2/2012)
I do find it hard to believe there's not a table size where it's actually beneficial not to bother (1 row even?).
Nevertheless, it is the case. A clustered index may not be beneficial on a one row table, but it will definitely not be harmful (ie beneficial not to bother).

I agree. No harm in most cases.

There might be cases where a heap makes more sense, but those would be the edge cases, not based on "size" of table.

Bobby Glover SSCarpal Tunnel Points: 4143 More actions · Answer 6

Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.

I intend to slowly convert each NON-C PK to Clustered PK

Steve Jones - SSC Editor SSC Guru Points: 741471 More actions · Answer 7

Nothing wrong with a PK being non-clustered.

The reasons for clustered v non-clustered have nothing to do with PK/FK. They are separate concepts.

CapnHector SSCoach Points: 16143 More actions · Answer 8

Bobby Glover (11/9/2012)
Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.
I intend to slowly convert each NON-C PK to Clustered PK

As Steve said PK/FK and CLUSTERED INDEX vs NON CLUSTERED INDEX are slightly different. in my environment there are quite a few primary keys that are non clustered because a different column makes a better clustering key (now these tables are not heaps). If the table is a true heap (No clustered index on the table) then defaulting to a clustered primary key may make sense. its all in how the information lays out in the table.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

ScottPletcher SSC Guru Points: 101238 More actions · Answer 9

Bobby Glover (11/9/2012)
Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.
I intend to slowly convert each NON-C PK to Clustered PK

I'd at least check SQL's missing stats info before you did that. It might be able to help identify a better clustering column(s) than the PK you already have.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

Bobby Glover (11/9/2012)
Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.
I intend to slowly convert each NON-C PK to Clustered PK

If the non-clustered PK's are on IDENTITY columns, there is virtually no risk that you'll open a world of pain.

If the PK's are on something other than an IDENTITY column (such as a GUID), then you could be opening up a real world of hurt. I'll also state that add a non-unique clustered index [font="Arial Black"]isn't[/font] the best performance decision.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Bobby Glover SSCarpal Tunnel Points: 4143 More actions · Answer 11

capnhector (11/9/2012)
Bobby Glover (11/9/2012)
Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.
I intend to slowly convert each NON-C PK to Clustered PK
As Steve said PK/FK and CLUSTERED INDEX vs NON CLUSTERED INDEX are slightly different. in my environment there are quite a few primary keys that are non clustered because a different column makes a better clustering key (now these tables are not heaps). If the table is a true heap (No clustered index on the table) then defaulting to a clustered primary key may make sense. its all in how the information lays out in the table.

I think I was mis-understood, there is no relationship to any other tables in terms of PK/FK. It simply a case of the table being in a heap. The PKs are non clustered, usually on a int column such as REFID. In my experience converting this to a Clustered PK shouldn't cause an issue?????

CapnHector SSCoach Points: 16143 More actions · Answer 12

Bobby Glover (11/10/2012)
capnhector (11/9/2012)
Bobby Glover (11/9/2012)
Praise the lord for this post. I have the same issue 1000 tables 750 are heaps, the wierd thing is they have NON-CLUSTERED PKs in all the tables. Some quite experienced DBAs before me have allowed this. I am currently converting these to CLUSTERED PKs.It is a basic OLTP db is there a possibility I could convert a PK and end up in a world of pain. ie QA may change the way it accesses data and by-pass some indexes on a table. Hopefully if it did, it would be because it found a quicker root to the data.
I intend to slowly convert each NON-C PK to Clustered PK
As Steve said PK/FK and CLUSTERED INDEX vs NON CLUSTERED INDEX are slightly different. in my environment there are quite a few primary keys that are non clustered because a different column makes a better clustering key (now these tables are not heaps). If the table is a true heap (No clustered index on the table) then defaulting to a clustered primary key may make sense. its all in how the information lays out in the table.
I think I was mis-understood, there is no relationship to any other tables in terms of PK/FK. It simply a case of the table being in a heap. The PKs are non clustered, usually on a int column such as REFID. In my experience converting this to a Clustered PK shouldn't cause an issue?????

as Jeff said above probably not. however as always YMMV and test thoroughly.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Bobby Glover SSCarpal Tunnel Points: 4143 More actions · Answer 13

In my case all PKs are unique but non clustered. It seems to me that a trick was missed when they were implemented. They should have been created as clustered indexes