Clustered index on non-unique and non-increasing column

Question

Clustered index on non-unique and non-increasing column

James Lavery

SSC Eights!

Points: 808
More actions
August 30, 2010 at 3:58 am

#387688

I'm the dba on a SQL 2005 database for 3rd-party system which has a lot of tables with no clustered indexes.
Looking a fragmentation on the database, there are a lot of tables with high fragmentation on their HEAPs, which is understandable, I believe, if there is no clustered index.
Typical values for one of the HEAPs are:
Record count: 490,000
Fragment count: 4,317
Page Count: 35,239
Avg Frag %: 97.38
Avg Page Space Used %: 83.32
None of the tables has a primary key defined, but nearly all have a column called id_object which has the following attributes.
* It is a varchar(16).
* It is unique in some tables, but not in others.
* It is nearly unique in a lot of tables (i.e. a select count(distinct object_id) gives say 90% of select count(*)).
* It is definitely not an increasing value.
So... given the above, is it worth creating a clustered index on these tables? If I do so the candidate column for the index is going to be the id_object column.
As I write this, I think it's obvious that we should create a clustered index on the id_object column for these tables - especially as I know that most queries will have id_object in them.
Does this sound like an appropriate strategy, even though the id_object column is non-unique and not auto-increasing?
Thanks,
James

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

remember the purpose of the clustered index is to make it faster to find the data; the data does nto necessarily have to be non-unique and not auto-increasing;

does your code look for stuff by the objectid? if it does, it makes sense to put the clustered index on that;

if it searches mostly by a DIFFERENT column and joins on objectid, i might cluster the other column, and INCLUDE the objectid column in the , or mauybe cluster the other column and that objectid together.

the idea is to help the server find the data the fastest, and our tools are the different idexes we can add.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Johan Bijnens SSC Guru Points: 135376 More actions · Answer 2

as already stated: it depends.

If that column is used in where clauses, as a join argument, or order_by\group_by, it may be good to create an index for that column.

Should it be a clustering index ?

I don't know. Keep in mind the clustering index is the one that actually contains the data pages AND keeps them (chained) in order of the (uniquified) clustering key by splitting 50/50 if needed.

Books online has good info http://msdn.microsoft.com/en-us/library/ms188783%28v=SQL.90%29.aspx

In many cases you'll find the advise to create a clustering index based on the column(s) that is/are used in "range" selects to gain the advantage of optimal IO and to create non-clustered indexes for "pinpoint" query columns.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

James Lavery SSC Eights! Points: 808 More actions · Answer 3

Thanks for the responses - they're very helpful.

James

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 4

Another question would be whether the id_object values are immutable. It sounds like they should be, but if for some crazy reason your system updates them frequently the clustered index will force the records to move around.

If the id_object value of new records is fairly random, you could minimize fragmentation by rebuilding the clustered index with a lower fill factor. The best tradeoff between lower fill factor (requiring more pages) and more frequent index rebuilds is hard to predict. You will have to monitor fragmentation over time and make adjustments.