clustered index hell

  • I have to support a large (100 gig+) database with about 230 users. It is 3rd party. They have almost no clustered indexes on any of their 1100 tables. They do have unique indexes on all the tables which all have a guid as this index. I realize with any good db design each table should have a clustered index, but with them using the guid as the unique index aren't I asking for trouble making that a clustered index considering the nature of the guid data

  • Hell yeah!

    You need to consider what queries are running. If they are using the guid as the main filter : Select * from Whatever Where guid = 'dfkjsd'....... and nothing else, I don't see an easy way out. You'll need to figure out which is less of a pain, no indexes or very fragmented indexes that needs a lot of work to keep in "order"!.

    At 200GB, you can expect to need to be defraging almost 24/7 to keep fragmentation in check, unless you have extremely fast disks.

    I have a fast disk array here and defraging only a 6GB DB takes 15-25 minutes a day (without any guid in there). The problem is that I can only do it that fast while noone is using the db. If the db is live, I need to stretch it out to a couple hours, and even then I get a few complaints.

    Another option would be to use online defrag assuming you can move to sql 2005 entreprise.

  • most of the queries do links with these guids

  • I just edited my previous answer with more details.

    Good luck.

  • I need more than luck. It is a 24/7 db with sometime during week without peak usage. My best understanding of with the clustered index is they tell you not to use a guid with it, so I am in a lose lose situation

  • It sounds like you're aware of this already, but it's worth mentioning. Depending on the insert volume and size of these tables, adding the clustered index on a GUID column can reduce performance by causing additional page splits during inserts and SQL Server has to physically re-order the data to make room for the new rows.

    There are obvious benefits to having the clustered index on the table, even if it is not on the columns typically used to query/join the tables. You may consider looking to see if the vendor includes any sort of 'DateCreated' type column in their tables. I've found that using a create date column for this at least gets the table to where it is not a heap and keeps the logical fragmentation down, even if you can't take advantage of using the clustered index column in your queries.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks, they do have a create_date on almost all their tables, never thought of that.

  • Definitely watch out for the GUID as John mentioned unless you're adding them sequentially.There's a way to do that in 2005. In 2000, use a date column, but try to pick one that won't cause too many splits on inserts.

    You really want a clustered index on tables. A lot of the optimizer decisions are built to take advantage of clustered indexes and heaps can be much less efficient, not to mention more fragmented.

  • Although there is much good to be said about having a clustered index, if the only PK that's available is a GUID, my recommendation would be to just forget about it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If all the queries that filter on the quid are equalities (hence finding 1 row), I'd put a nonclustered index on the Guid and find somewhere else for the cluster.

    I've seen tables go from 0 to 99.95% fragmented in 6 hours because of the cluster on the Guid.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • John Rowan (1/9/2009)


    You may consider looking to see if the vendor includes any sort of 'DateCreated' type column in their tables. I've found that using a create date column for this at least gets the table to where it is not a heap and keeps the logical fragmentation down, even if you can't take advantage of using the clustered index column in your queries.

    I have a similar situation here. (Maybe it's the same vendor !?!?)

    How will the clustered index help, if it's on a column such as date that's never actually made use of ?

  • It will help with introducing the ability to defragment the table. With NO clustered index, the only way to defrag a heap is to copy all of the data into a new table and drop the old one. Needless to say, not a practical solution in a 24/7 shop.

    In this particular case - I would actually make the GUID the second column in the clustered index, keeping the create date as the leading column. Will introduce a little fragmentation in the last few pages, but that's fairly easy to maintain (it actually won't be any more fragmentation than you'd get if you didn't put it there, since you won't be able to tag it as a UNIQUE clustered index without the GUID, and the indexer will essentially throw something random onto the end to make it unique).

    Heavily updated heaps are a perf dive bomb in my experience.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I will approach vendor about this. There are 1200 + tables utilized, with lots of inserts and updates. Will probably try to capture trace

  • Heh... I realize it's probably a little late now, but I'd get another vendor or write the stuff myself. That code and database design sound just absolutely horrid and patching it in just about any way, shape, or fashion could just break the hell out of the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wish I had that option, 1200 tables no cluster index, kind of liking getting a bi-plane with jet engines to fly,

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply