Clustered Index Modification

  • I have a database with a lot of 'too-wide' clustered indexes, many of which include uniqueidentifiers. In almost every case, these are also the Primary Key.

    What I'd like to do is convert the primary keys as unique non-clustered indexes and move the clustered index onto a much smaller column, probably an identity column that I would generate solely for this purpose.

    Some of these columns have foreign keys as well. The database will not be online or operational during this process.

    The only way I can think of to do this right now is to create a large script that will:

    Drop all foreign key constraints

    Drop the primary key constraint

    Drop the clustered index.

    Create an identity column

    Create a new clustered index

    Re-Create the primary key constraint

    Re-Create the foreign key constraints.

    The constraints/fields etc. on all the 100+ tables I want to do this on are of course different, so there's probably a fair bit of work involved in creating said script. Before I go off down that path, I just wanted to kinda throw out a feeler and get ideas as to any better way to do this.

    Some additional notes:

    I am using SQL 2008.

    I have a copy of the RedGate SQL Toolbelt (Not sure if there is anything int here that'd help here, there are a lot of the utilities I haven't had time to explore yet)

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • As far as the process goes, I think you have it right. The only suggestion I can make is to script the process. I'd open up PowerShell and put it to work. I recently did a, much smaller, much reduced, version of what you're going [/url]to do. You might be able to use it as a starting point.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the feedback and the suggestions Grant. I'm still pretty useless at powershell, but I thinK I've figured out a pretty decent way to do it. I'll write a blog post with the code and a full explanation of what I did and link it here, but the gist of it is:

    1. Create a backup of the database.

    2. Restore that backup to a new database.

    3. Drop all foreign key constraints from the copy.

    4. Use Redgate SQL Compare to generate all the code to create the foreign key constraints again.

    5. Drop all foreign key constraints from the real DB.

    6. Write a script to drop the existing clustered index (either dropping the constraint if it's the PK or just the index if it's not), create a new identity column, create a new clustered index on the identity column and then create the primary key again (as a nonclustered index) or a unique index with the fields that were the clustered index.

    7. Run the SQL Compare Script to re-create all the foreign keys.

    For anyone else reading this after the fact, Remember of course that when I use the term 'real db', I'ms till talking about a dev copy, and the copy is a copy of that copy. I'm not doing this to a live database.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Nice high level of paranoia. Good job.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • So... apparently the identity column poses issues with the Application and I need something that can be directly inserted into. Going to try to do this a different way.

    All of our tables contain some basic fields, one of which is createdate. This field is updated at the time the record is inserted and then never modified again(also note that an instead of insert trigger actually does the writing). There is a possibility (very large possibility on some of the tables) that this field will not be unique for inserts occurring as a batch. I know that there is an under the covers 'unique-ification' process that happens when a clustered index key is not unique, but I'm not really sure how expensive it is. Anyone have some feedback as to the cost of this process? Would the cost be worth it to replace a 20+ byte clustered index key with an 8 byte key? How about a 50? Does this process automatically have to happen as soon as the key is not known by the database to be intrinsically unique?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • The uniquifier (or uniqueifier) is a 4byte integer value. So from that you can estimate the cost to the size of the index. So that not only increases the size of the cluster, but it becomes a part of each and every pointer in any nonclustered indexes, so the splash radius grows. Maintaining this value is pretty trivial and isn't affected by the size of the rest of the key for the index. Of course, the size for the rest of the key of the index certainly affects it's behavior elsewhere.

    As a general rule, you want your cluster to be unique, as narrow as possible, and unchanging. Granted that you can't always satisify all three of those requirements, you should try because anything else will lead to issues such as the uniqifier, fragmentation, excessive paging...

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Seeing as an INSTEAD OF trigger is already being used, another option is:

    USE tempdb;

    CREATE TABLE dbo.Test

    (

    row_id INTEGER IDENTITY NOT NULL,

    create_date DATETIME NOT NULL

    CONSTRAINT [DF dbo.Test create_date = now]

    DEFAULT (GETUTCDATE()),

    data VARCHAR(50) NOT NULL,

    CONSTRAINT [PK dbo.Test row_id]

    PRIMARY KEY CLUSTERED (row_id)

    WITH (FILLFACTOR = 100)

    );

    GO

    CREATE VIEW dbo.AppVisibleTest

    WITH SCHEMABINDING

    AS

    SELECT T.create_date,

    T.data

    FROM dbo.Test T;

    GO

    CREATE TRIGGER [trg dbo.AppVisibleTest IOI]

    ON dbo.AppVisibleTest

    INSTEAD OF INSERT

    AS

    BEGIN

    SET ROWCOUNT 0;

    SET NOCOUNT OFF;

    IF NOT EXISTS (SELECT * FROM inserted)

    RETURN;

    INSERT dbo.Test

    (create_date, data)

    SELECT create_date, data

    FROM inserted

    WHERE create_date IS NOT NULL;

    INSERT dbo.Test

    (data)

    SELECT data

    FROM inserted

    WHERE create_date IS NULL;

    END;

    GO

    INSERT dbo.AppVisibleTest

    (create_date, data)

    VALUES ('1900-01-01', 'Row #1');

    INSERT dbo.AppVisibleTest

    (data)

    VALUES ('Row #2');

    GO

    SELECT *

    FROM dbo.AppVisibleTest;

    SELECT *

    FROM dbo.Test;

    GO

    DROP VIEW dbo.AppVisibleTest;

    DROP TABLE dbo.Test;

    Paul

  • Grant Fritchey (3/25/2010)


    The uniquifier (or uniqueifier) is a 4byte integer value. So from that you can estimate the cost to the size of the index. So that not only increases the size of the cluster, but it becomes a part of each and every pointer in any nonclustered indexes...

    Two quick things to slightly expand on that:

    1. The uniquifier is only added to rows that need it. The 'original' row never gets a uniquifier - just the ones with duplicate keys.

    2. For a non-unique clustered index, the entire clustered key (including the uniquifier, if present) is added to the key of any non-unique non-clustered indexes. That means it is included at every level of the index tree, not just the leaf. The clustered key is not added to the key of a unique non-clustered index, but it is still INCLUDEd at the index leaf. The situation is reversed if the clustered key is unique.

    If you are anything like me, point two might seem counter-intuitive and/or confusing. Kalen had a good go at explaining it in the following blog enrties:

    http://sqlblog.com/blogs/kalen_delaney/archive/2008/03/16/nonclustered-index-keys.aspx

    http://sqlblog.com/blogs/kalen_delaney/archive/2010/03/07/more-about-nonclustered-index-keys.aspx

  • I've run both scenarios now (Creating the clustered index on an Identity column, which I had to roll back and creating the clustered index on the existing createdate column). Due to time constraints with this upgrade process I can't get as in depth as I'd really like to in evaluating all these changes, but what I've found is that even though it saves space and makes queries that target those indexes and don't need to expand columns run faster, the fields that are the current clustered index are returned in almost every query against those databases. To really make the change beneficial and not lose performance, I'd need to re-optimize all the non-clustered indexes as well, which I do not have the time to do for the 80+ tables before we push it live.

    With that in mind, I've prettymuch talked myself out of my shotgun approach to doing all these tables at once and decided to focus on just the ones with really bad clustered indexes that we either A. almost never query off of or B. involve a uniqueidentifier.

    I'm also thinking that data compression will play a big part in downplaying a lot of the differences here. I'm using page level compression on a good % of these indexes and I'm wondering if that is helping these particular keys quite a bit. Here's an example.

    PreModification, I have a clustered index on a table that is 34 bytes in width. These fields are, in order:

    col type base size

    nvarchar(10) 20

    int 4

    int 4

    nchar(1) 2

    int 4

    That first field though is a left padded number that will be something like ' 200102', ' 200103',' 200104' etc. I'm thinking that the left 8-9 characters get compressed and it significantly shrinks up the clustered key that is actually stored on the pages, where as the date isn't necessarily as compressed (If it even compresses dates, not sure there, need to check up on that.)

    Now I have a nonclustered index on a nvarchar(8) column (single column index) on that table. Here are the stats for the clustered indexes + that index for the two databases:

    Database A Database B

    col type base size col type base size

    nvarchar(10) 20 datetime 8

    int 4 uniqueifier 4

    int 4 nvarchar(8) 16

    nchar(1) 2

    int 4

    nvarchar(8) 16

    Rows in both tables: 1089801

    Pages for the nonclustered Index with page level compression:

    Database A: 2032

    Database B: 1747

    That last nvarchar(8) column (The non-clustered index key) is either NULL or 1 of only 2 values, so that probably gets compressed very heavily as well.

    Also, Paul: Thanks very much for the view example, I'll have to look at that some more and see where I might be able to apply that.

    [Edit] Fixing Column Formatting. It's still messed up, but it's better than it was. Blech.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Paul White NZ (3/25/2010)


    Grant Fritchey (3/25/2010)


    The uniquifier (or uniqueifier) is a 4byte integer value. So from that you can estimate the cost to the size of the index. So that not only increases the size of the cluster, but it becomes a part of each and every pointer in any nonclustered indexes...

    Two quick things to slightly expand on that:

    1. The uniquifier is only added to rows that need it. The 'original' row never gets a uniquifier - just the ones with duplicate keys.

    2. For a non-unique clustered index, the entire clustered key (including the uniquifier, if present) is added to the key of any non-unique non-clustered indexes. That means it is included at every level of the index tree, not just the leaf. The clustered key is not added to the key of a unique non-clustered index, but it is still INCLUDEd at the index leaf. The situation is reversed if the clustered key is unique.

    If you are anything like me, point two might seem counter-intuitive and/or confusing. Kalen had a good go at explaining it in the following blog enrties:

    http://sqlblog.com/blogs/kalen_delaney/archive/2008/03/16/nonclustered-index-keys.aspx

    http://sqlblog.com/blogs/kalen_delaney/archive/2010/03/07/more-about-nonclustered-index-keys.aspx

    Thanks for the links, I'll check those out. Here are a couple others I read through yesterday while investigating the uniqueification process for anyone following along who wants to know more about that:

    Michelle Ufford: http://sqlfool.com/2009/05/overhead-i-non-unique-clustered-indexes/

    Andrew Calvett: http://sqlblogcasts.com/blogs/acalvett/archive/2008/10/20/the-overhead-of-a-non-unique-clustered-index.aspx

    Chad Boyd: http://blogs.msdn.com/chadboyd/archive/2007/04/08/non-unique-clustered-index-and-duplicate-value-limits.aspx

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Yes, datetime is compressed. One of the best sources I have found for information concerning the internals of ROW and PAGE compression is http://blogs.msdn.com/sqlserverstorageengine/archive/tags/Data+Compression/default.aspx. Loads of great stuff there.

    R2 will include extremely efficient Unicode compression too, which is exciting.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply