Composite clustering key

  • We have a table that stores ~3 billion rows with the following columns:
    [ID], [Year], [CustomerID], [Data1], [Data2], [Data3].
    The ID column is the Clustered PK.

    The table will have one daily INSERT, many SELECTs per day, and DELETEs once per year.

    The query that reads from this table looks like:
    Select Data1, Data2, Data3... from Table1 where Year = @Year and CustomerID = @CustomerID

    To optimize this select query, we created a non clustered index on Year and CustomerID, and included all the Data colummns.

    However, this seems like a waste of space since basically we have 2 copies of the table.

    I'm wondering whether it would be more beneficial if we just created the table using a composite cluserting key of Year, CustomerID, ID.
    In theory then we wouldn't need a non-clustered index anymore?

    Thanks in advance.

  • Simon E Doubt - Monday, February 27, 2017 2:54 PM

    We have a table that stores ~3 billion rows with the following columns:
    [ID], [Year], [CustomerID], [Data1], [Data2], [Data3].
    The ID column is the Clustered PK.

    The table will have one daily INSERT, many SELECTs per day, and DELETEs once per year.

    The query that reads from this table looks like:
    Select Data1, Data2, Data3... from Table1 where Year = @Year and CustomerID = @CustomerID

    To optimize this select query, we created a non clustered index on Year and CustomerID, and included all the Data colummns.

    However, this seems like a waste of space since basically we have 2 copies of the table.

    I'm wondering whether it would be more beneficial if we just created the table using a composite cluserting key of Year, CustomerID, ID.
    In theory then we wouldn't need a non-clustered index anymore?

    Thanks in advance.

    Can you post the full DDL please?
    😎

  • Simon E Doubt - Monday, February 27, 2017 2:54 PM

    We have a table that stores ~3 billion rows with the following columns:
    [ID], [Year], [CustomerID], [Data1], [Data2], [Data3].
    The ID column is the Clustered PK.

    The table will have one daily INSERT, many SELECTs per day, and DELETEs once per year.

    The query that reads from this table looks like:
    Select Data1, Data2, Data3... from Table1 where Year = @Year and CustomerID = @CustomerID

    To optimize this select query, we created a non clustered index on Year and CustomerID, and included all the Data colummns.

    However, this seems like a waste of space since basically we have 2 copies of the table.

    I'm wondering whether it would be more beneficial if we just created the table using a composite cluserting key of Year, CustomerID, ID.
    In theory then we wouldn't need a non-clustered index anymore?

    Thanks in advance.

    Posting full DDL will help getting an answer, but my first thought would be:
    1. Drop the NCI with the included columns.
    2. Partition on Year
    3. Clustered idx (unique) = CustomerID, ID, Year.
    4. PK can be same as clustered index, or a separate non-clustered index on just ID.

    Eddie Wuerch
    MCM: SQL

  • My, very general, rule is that the clustered key should be on the column or columns most frequently used to access the data. In many, probably even most, cases that's the primary key. However, it isn't always true. I'd say yeah, you're likely to see a lot of benefit from changing the clustered index. 

    Now, the hard part is determining two things, the leading edge, the first column, of that index, and which columns belong in it. First up, based on your example query, the best candidate is the CustomerID. Presumably this is a somewhat well distributed data set because it's going to drive the creation of the histogram for the statistics. We need that to be useful for the optimizer so that this index naturally gets picked for your query. Then you'd obviously add the year too.

    But, the next question is, are these two columns unique? Probably not. This means you'd have a non-unique clustered key, so a uniqueifier column would be added internally. So, possibly, adding the ID column from the primary key (which you'll still have, but it'll just be nonclustered) to make this a unique index might be a good idea. I'd want to test that both ways before I decided on a path.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply