Question on index column ordering

  • I thought about that as well, having the Clustered index be Date,PkID (but that still would be as wide as my other key which was date,int,int), my PK is bigint (since it's possible to have ~2B+ rows (though no one has got there yet!). It gives worse performance (in terms of logical reads).

    The real kicker is that as Gail said, if the server has enough memory then the number of physical reads may not be an issue, I don't really have a choice on this piece, we make 'recommendations' a customer as far as server specs go, and I always tell them heck, how big is your DB going to be? buy that amount of RAM! haha, but they balk at that, so it's not consistent across the customer base. Some customers will have relatively small DBs, which others very large (the largest i've dealt with is ~300GB with 1.5B rows), I try to convience customers that the more RAM the better (but at the same time I realize the application's code could be tweaked (as Gail pointed out Select * isn't really that great))and I don't want to throw hardware as a potential application problem.

    I appreciate all your help on answering my questions.

  • Jeff Moden (12/23/2011)


    Dev (12/23/2011)


    Gail I agree to you on SEEK operation but I care about following.

    Clustered indexes are not a good choice for the following attributes:

    Columns that undergo frequent changes

    This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

    Wide keys

    Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.

    With Datetime column I doubt on item#1 and partially on item #2.

    Like Gail said, one of the best uses for a Clustered Index is on a DateTime column especially when it's a fairly static column like "TransactionDate", "StartDate" or some such. Even if you have duplicate dates, it's still great because it will SEEK on the start of a range and scan "consecutive adjacent" rows from there. Of course, if the dates aren't unique, it's good to make the first column a date column and have a second column in the CI containing some unique identifier such as an IDENTITY column.

    I am not sure OP’s index has static datetime. If the datetime changes quite often it will fragment the index. That was my first assumption when I saw it in Clustered Index definition.

  • The date's aren't unique, if the datapoint is minute data, then it's possible to have 65 (or more) dates in an hour, a 1 hour (00:00), 4 15min (with one at 00:00) and 60 min values (one at 00:00). However that being said having aid,bid I can't do, because they are not unique at all (those are the least unique, but provide the "detail" about the rest of the record).

  • Dev (12/25/2011)


    Jeff Moden (12/23/2011)


    Dev (12/23/2011)


    Gail I agree to you on SEEK operation but I care about following.

    Clustered indexes are not a good choice for the following attributes:

    Columns that undergo frequent changes

    This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

    Wide keys

    Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.

    With Datetime column I doubt on item#1 and partially on item #2.

    Like Gail said, one of the best uses for a Clustered Index is on a DateTime column especially when it's a fairly static column like "TransactionDate", "StartDate" or some such. Even if you have duplicate dates, it's still great because it will SEEK on the start of a range and scan "consecutive adjacent" rows from there. Of course, if the dates aren't unique, it's good to make the first column a date column and have a second column in the CI containing some unique identifier such as an IDENTITY column.

    I am not sure OP’s index has static datetime. If the datetime changes quite often it will fragment the index. That was my first assumption when I saw it in Clustered Index definition.

    Other than a "LastModifiedOn" column, which type of date columns do you know of that aren't fairly static? Even EndDate columns are fairly static... they only change once in most cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dfrome (12/25/2011)


    The date's aren't unique, if the datapoint is minute data, then it's possible to have 65 (or more) dates in an hour, a 1 hour (00:00), 4 15min (with one at 00:00) and 60 min values (one at 00:00). However that being said having aid,bid I can't do, because they are not unique at all (those are the least unique, but provide the "detail" about the rest of the record).

    Now everyone here are curious to know what value does the datetime column stores. Per Jeff's comment (& my suspicion), if it's "LastModifiedOn" column it's not a candidate for Clustered Index.

  • Dev (12/25/2011)


    dfrome (12/25/2011)


    The date's aren't unique, if the datapoint is minute data, then it's possible to have 65 (or more) dates in an hour, a 1 hour (00:00), 4 15min (with one at 00:00) and 60 min values (one at 00:00). However that being said having aid,bid I can't do, because they are not unique at all (those are the least unique, but provide the "detail" about the rest of the record).

    Now everyone here are curious to know what value does the datetime column stores. Per Jeff's comment (& my suspicion), if it's "LastModifiedOn" column it's not a candidate for Clustered Index.

    Oh, damn. My apologies. It must be my poor wording and the fact that the information I posted is spread out across a couple of posts.

    My intent on the post above was simply to identify that, usually, only such date/time columns as a "LastModifiedOn" column suffer frequent changes. In the scheme of things and depending on what the needs of the most common queries against a table are, putting a Clustered Index on a "LastModifiedOn" column isn't a taboo or even a "bad" idea especially since the date used in such a column is "ever increasing". Because of the "ever increasing" date, there will be logical gaps left in the Clustered Index but they (the modified rows) won't cause much in the line of page splits or real fragmentation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gail - does it still make a difference to sql server if your query predicate is in the same order as the columns in the index? I noticed index was date, aid, bid, and the query was aid, bid, date. I know the order used to matter and I can't remember if/when this was fixed.

  • Nope, it makes no difference and it hasn't likely since SQL 6.5 or 7, whichever one introduced the cost-based optimiser.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply