Clustered index design

  • I have a table with 170 million rows, ~ 40GB of data space (~30GB of index space). Let's assume:

    CREATE TABLE [tblTest](

    [WorkID] [int] IDENTITY(1,1) NOT NULL,

    [WorkstationID] [int] NULL,

    [WorkDateTime] [datetime] NULL,

    CONSTRAINT [WorkID_PK] PRIMARY KEY NONCLUSTERED

    (

    [WorkID] ASC

    )

    )

    CREATE UNIQUE CLUSTERED INDEX [StationTime] ON [tblTest]

    (

    [WorkID] ASC,

    [WorkDateTime] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    (imagine a lot more data columns of course) WorkID being the primary key.

    Original database design determined that the WorkstationID and datetime were unique, so a unique clustered index was created spanning WorkstationID and WorkDateTime.

    Of course there are now indexes covering all kinds of other combinations of queries.

    New conditions are that the WorkstationID/ WorkDateTime are no longer going to be unique. The only guaranteed unique now is the auto-increment column WorkID (identity column).

    So here's my thinking:

    - Keeping the non-unique clustered index would be painful as SQL would have to increase the index size with its unique identifier.

    - Copying the data to a new table which had the clustered index on the primary key (took a long time) but showed that performance on most queries weren't affected (queries that were using the clustered index now have an extra lookup to do, but, the clustered index is half the size, which should help all the queries?)

    Other than the amount of time it will take (a long time) to re-create the clustered index on WorkID (using drop-existing) I don`t see any reason not to make this change?

    And then for making the change, if I want to:

    a) Change the clustered index StationTime to non-clustered

    b) Change existing index WorkID_PK to clustered

    Last question:

    Do I have to do them all in one step (i.e. Stack Overflow: 2297355 or can I do it in one shot with drop_existing

    I'll be able to test on a dev server, but really need to get the whole process to wrap up within 36 hours which is my largest available window.

    My other option is to batch-insert rows into a newly created table (~16 hours), but I have to be careful with some relationships between tables when I flip all that around.

    -Kevin

  • If WorkID is staying an identity column, then, you're guaranteed that the combination of WorkID and WorkDate are still identical because WorkID will always still be identical. But maybe I misunderstood something.

    Modifying the cluster is going to modify the non-clustered indexes. It might be quite a lot faster to migrate the data to a new table, without nonclustered indexes, then apply those indexes after the data move.

    I'd be a little nervous about additional key lookups. If you were getting a lot of covering caused by having WorkDate in place, but now you're doing more lookups, those can be quite expensive, an extra 3-4 page reads on average, for each page read against the non-clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • - Keeping the non-unique clustered index would be painful as SQL would have to increase the index size with its unique identifier.- Keeping the non-unique clustered index would be painful as SQL would have to increase the index size with its unique identifier.

    The 4 byte Uniquifier will only by used when it is necessary(when there is a duplicate). Keeping the Clustered index on WorkstationID/ WorkDateTime, might not be as expensive(depending on how many duplicates there might be in the future).

    Deciding on a clustered index will often be a trade off with on the size(the narrower the better), and how often are you are you going to access records using it(in this case the Workid).

  • Thanks Sharky, I think that actually clears up the main question I had about the use of uniquifier.

    A majority of my queries are on the current clustered index, and the amount of duplicates I expect to be < 0.01% of total rows, so the extra page space for the uniquifier will probably be minimal in the grand scheme of things.

    This post[/url] did some proofs about impact of uniquifier.

    Based on this I'm leaning towards the space penalty being negligible compared to benefit of keeping clustered index on how the data is mostly used, and just removing the uniqueness on the clustered index.

    Thanks!

    -Kevin

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply