August 30, 2010 at 3:58 am
I'm the dba on a SQL 2005 database for 3rd-party system which has a lot of tables with no clustered indexes.
Looking a fragmentation on the database, there are a lot of tables with high fragmentation on their HEAPs, which is understandable, I believe, if there is no clustered index.
Typical values for one of the HEAPs are:
Record count: 490,000
Fragment count: 4,317
Page Count: 35,239
Avg Frag %: 97.38
Avg Page Space Used %: 83.32
None of the tables has a primary key defined, but nearly all have a column called id_object which has the following attributes.
* It is a varchar(16).
* It is unique in some tables, but not in others.
* It is nearly unique in a lot of tables (i.e. a select count(distinct object_id) gives say 90% of select count(*)).
* It is definitely not an increasing value.
So... given the above, is it worth creating a clustered index on these tables? If I do so the candidate column for the index is going to be the id_object column.
As I write this, I think it's obvious that we should create a clustered index on the id_object column for these tables - especially as I know that most queries will have id_object in them.
Does this sound like an appropriate strategy, even though the id_object column is non-unique and not auto-increasing?
Thanks,
James
August 30, 2010 at 6:57 am
remember the purpose of the clustered index is to make it faster to find the data; the data does nto necessarily have to be non-unique and not auto-increasing;
does your code look for stuff by the objectid? if it does, it makes sense to put the clustered index on that;
if it searches mostly by a DIFFERENT column and joins on objectid, i might cluster the other column, and INCLUDE the objectid column in the , or mauybe cluster the other column and that objectid together.
the idea is to help the server find the data the fastest, and our tools are the different idexes we can add.
Lowell
August 30, 2010 at 7:44 am
as already stated: it depends.
If that column is used in where clauses, as a join argument, or order_by\group_by, it may be good to create an index for that column.
Should it be a clustering index ?
I don't know. Keep in mind the clustering index is the one that actually contains the data pages AND keeps them (chained) in order of the (uniquified) clustering key by splitting 50/50 if needed.
Books online has good info http://msdn.microsoft.com/en-us/library/ms188783%28v=SQL.90%29.aspx
In many cases you'll find the advise to create a clustering index based on the column(s) that is/are used in "range" selects to gain the advantage of optimal IO and to create non-clustered indexes for "pinpoint" query columns.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
August 30, 2010 at 10:05 am
Thanks for the responses - they're very helpful.
James
August 31, 2010 at 6:46 am
Another question would be whether the id_object values are immutable. It sounds like they should be, but if for some crazy reason your system updates them frequently the clustered index will force the records to move around.
If the id_object value of new records is fairly random, you could minimize fragmentation by rebuilding the clustered index with a lower fill factor. The best tradeoff between lower fill factor (requiring more pages) and more frequent index rebuilds is hard to predict. You will have to monitor fragmentation over time and make adjustments.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply