Reasons Against Clustered Index

Question

Reasons Against Clustered Index

mssql_rules

SSCrazy

Points: 2783
More actions
March 29, 2005 at 9:21 am

#85653

Can anyone think of a valid reason against clustered index? I know there are several reasons why a table should have a clustered index. But there is one (and probably just one) good reason, for a table not to be defined with a clustered index. Can any one think of any?
thanks.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

The only valid reason I can think of is that the clustered index would need to be updated 10 000s times per minute (let's read extremely often here) and that the system cannot keep up with the updates. But even then I would simly suggest moving the clustered index on another unique column (either existing or identity) that would reduce the need to rearrange the indexes all the time.

What reason did you have in mind?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Just thaught of another reason... I have a dts package that pumps the data from an error log every 2 minutes. The data is then spread across 4 other tables for relational integrity. The temp table doesn't need a clustered index to do its job... but even then I create one on the fly (because it failed if that field existed when doing the data pump) so that I can identity any failed step and log it accordingly.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

Reasons against a clustered index:

- Fragmentation. If the indexed column(s) are subject to updates this can cause page splits & fragmentation.

- Size of non-clustered indexes. Every key value in the clustered index gets replicated into all the non-clustered indexes on the table. If the clustered index is built on several wide columns, you bloat the other indexes.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

About the size... that would be a reason against a bad choice of Clustered index... not for a reason not to use it.

DCPeterson SSCoach Points: 15266 More actions · Answer 5

Many people seem to think that clustered indexes are "more expensive" to maintain than non-clustered indexes because they need to "rearrange" or reorganize the data frequently. This is not the case. A clustered index does not attempt to physically arrange the data on disk (something that would be outrageously expensive). It maintains the ordering of the data via a double-linked-list, rather than by actually storing the data in a predefined order on disk. It maintains a logical ordering rather than a physical one.

It is true that if you define a clustered index on an IDENTITY column, for example, it can lead to I/O "hot spots" particularly in a heavy INSERT environment, but as you point out in a later post, that isn't an argument against clustered indexes, it is an argument to choose them more wisely.

I'm not one of those people who insist that you should always have a clustered index on every table. But you should certainly consider using one anytime you have a column, or columns, that are often searched based on ranges i.e. if your queries against them use <, >, or BETWEEN.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

mssql_rules SSCrazy Points: 2783 More actions · Answer 6

Thanks for your participation and sharing your thoughts on this topic. Let me add some context to my original question with an example. Consider this scenario: A very large table (few hundread million rows) with columns Col1, col2, col3, col4 and col5 (all integers). Data in this table is loaded once a year (entire table), updated throughout the year in large volumes, and archived / deleted at the end of the year. Only col4 and col5 are updated during the update operations. Primary key is defined on Col1, col2, col3. Update operations use either col1, col2 or col3 in the where clause at about the same number of times (so about 33% of time updates qualify with col1, 33% on col2 and 33% on col3). Do you suggest a clustered index or not. Why?

Johan Bijnens SSC Guru Points: 135662 More actions · Answer 7

just my 2ct

we always have a clustered index (even if on a identity or current-datetime column if no other suggested by the development team ! ) because we have noticed (with sql7) that when having no clix and heavy insert/delete rate, pagemanagement gets stuck and pages are not freed/reused as they should be.

With a clustering index, we did not encounter this issue.

Regarding your table questions :

Keep this in mind :

Your table having 5 int-columns (all 5 not null ?) : rowsize 20 bytes

max rows / page : 403 (8060/20)

500.000.000 rows = +- 1.241.000 pages

1.241.000 pages = +- 9,7 Gb

this all without the indexes !

Your PK will take 12 bytes / row + some overhead.

I'd calculate cardinality for each of the key-columns, and define the most filtering as clustered index.

You may also have to play around with fillfactor and PAD_INDEX to avoid frequent reindex/indexdefrag.

Once in a while capture the actual sql-load and have the index-wizard analyze your load and evaluate your existing indexes.

Also have your data-upload and -maintenance applications take this clustering algorithm into consideration when adding new data ! This way you avoid unneeded page-splits.

Didn't I read someone's statement like : There is more than one way to skin a cat

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Frank Kalis SSC Guru Points: 111183 More actions · Answer 8

http://www.sqlservercentral.com/columnists/chedgate/clusterthatindex.asp

http://www.sqlservercentral.com/columnists/chedgate/clusterthatindexparttwo.asp

http://www.sqlservercentral.com/columnists/chedgate/clusterthatindexpart2.asp

gives a reasonable discussion of that topic.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Vladan SSC-Insane Points: 22072 More actions · Answer 9

On a table that you described I would use a clustered index, though not necessarily on the Primary Key... depending on how each column is most often referenced in the WHERE clauses, and how much variability there is in the values of each column, it may show to be more advantageous to define the clustered index on 1 or 2 columns only.

Anyway, if each of the 3 columns tends to be used "alone" in the WHERE clause (not specifying the values of other 2 columns), there should be a separate index on each of the columns, (except on the column that is first in the composite clustered index)... because AFAIK if you use only the second or third column of a composite index in the WHERE clause, index is not used and it results in table scan. I can't tell, without knowing your data and most frequent queries, whether a composite or one-column clustered index would increase performance more. You'll have to test it... which is anyway the only reliable way how to find out, which indexes should be used :-))

Hans Lindgren SSChampion Points: 10496 More actions · Answer 10

- Range Scans and Inserts

If your table contains a huge amout of columns your range scans can be very costly and in some cases Non Clustered indexes can be beneficial.

(Inserting into a table without clustered index can be faster in some instances)

Happy hunting! Hanslindgren

mssql_rules SSCrazy Points: 2783 More actions · Answer 11

In this case, I am NOT using a clustered index, here is how I justify this:

I add a non-clustered index on the primary key columns (col1, col2 and col3 for enforcing the PK constraint) and non clustered index on each of the columns used in the where clause (col4 and col5 mostly each of them is used by itself). Now when an update needs to happen, using one of the where clause columns the non-clustered index on that column would point to the actual rows in the heap (non - clustered table). So there is no intermediate step of looking up the clustered index value and then accessing the actual row, so this performs better. A clustered index would have been required if the inserts and updates resulted in page splits leading to fragmentation. Since, Without a clustered index, I would have to recreate all the non-clustered indexes every time I did reorg. But in this particular case, there are no inserts or deleted and even the updates are on integer fields, so chance of page splits or row migrations during updates are not possible (are next to nothing). I think this is a tailor made case against clustered index. I am a big proponent of clustered indexes, but my testing in this case confirmed that there is scenario where clustered index is non only more space consuming but will hurt the performance too. I appreciate your feedback.

thanks.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 12

However ... you can't specify a FillFactor on a heap.

Since you only Update throughout the year, and since those updates are on int columns that don't cause page splitting, you could have this table at 100% FillFactor.

Only a clustered index (created after the initial data load) will allow you to have 100% Fillactor on the data pages, and the subsequent reduction in page I/O for all Update/Select operations throughout the year.

mssql_rules SSCrazy Points: 2783 More actions · Answer 13

PW:

That is a good point. I guess I can make my PK index as a clustered index at 100% fill factor and then drop the clustered index. I figured, since I am not adding / deleting data or modifying the PK columns, I do not need any explicit PK constraint / indexes.

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 14

You actually have a covered index which can often perform equally with a clustered index, you might want to examine your i/o count however, often using a nc index will use 1 extra i/o per operation - this is a generalisation of course and any number of scenarios can be generated to either prove or disprove this statement < grin > Anyway you might want to check - this is the only situation where I have changed a nc index to a clustered index - oh and where I might want to remove fragmentation 'cos you can't without a clustered index.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/