Non-clustered Index Insert/Delete Performance

  • Hi there,

    I have have an index that becomes very expensive to maintain and is causing performance issues. Background below:

    Table t_Orders is partitioned on OrderGroup which can contain one of 20 values (1-20). It has a unique clustered index on OrderGroup, OrderCode which is alligned with the partitioning scheme. There are two non-clustered, non-unique indexes. OrderGroup, ProductID and OrderGroup, CustomerID which are also partition alligned.

    t_Orders is a subset of a larger table t_OrderHistory which contains an audit of the history of all orders. Every 5 seconds or so our application comes along and merges the changes from t_OrderHistory into t_Orders using a stored procedure. The size of the table t_Orders remains at ~150k rows but more than 1.5m rows are inserted and then subsequently deleted throughout the course of the day. I should also note that around 85% of these 1.5m changes go through 2 or the 20 partitions, so the table is also imbalanced in that way.

    The level of selectivity in the problematic index (OrderGroup, CustomerID) is low and unbalanced. There are around 300 unique CustomerIDs but the top 10 account for ~90% of the volume passiing through in the way of inserts and deletes.

    When examining the cached execution plan for the stored procedure that is performing badly when we experience the issue, it is apparent that maintaining this index is by far the most expensive operation taking place in the batch. When the issue is seen it is often some way into the day so one can imagine a large number of entries have been inserted and then deleted from this index by that time.

    When stress testing this index in a dev environment I can see that the index grows large (thousands of pages). When looking at the index operational stats dmv, the vast number of records that are removed are marked for deletion (ghost records) but not physically deleted from the index, so it appears that it bloats and becomes very fragmented because of the nature of the way we use this table.

    This index is not utilised very well, any calls to it have been proven to only use it in scan mode. In order to alleviate the performance problem we are considering just dropping it.

    I'm exploring the possibility of a better index and certainly CustomerID should appear as the first column so sql server can gain better stats and hopefully utilise it in seek mode. I'm struggling to understand how I can design my index to be more efficient for sql server to maintain in terms of inserting and deleting. It has been suggested we use CustomerID, OrderCode on the partition scheme making it more unique. It's my understanding that if I create an index on CustomerID alone then it will have the clustered index key (OrderGroup, OrderCode) stored at the leaf level anyway. So, my questions are:

    What is the difference between the following indexes in the scenario of my partitioned table indexing on the partitioning scheme:

    1. CustomerID

    2. CustomerID, OrderCode

    Are they equivalent? Which would be the cheapest to maintain bearing in mind the volume of change that gets cycled through this table each day.

    Can anybody point me in the direction of some further reading? Or suggest what a better choice for indexing this table may be? Would any index using CustomerID be expenisve to maintain?

  • Hi SomeDude,

    The fact that you're hitting this table so hard is really the cause of your issue. The problem is, the engine has to do a sort operation for each of the non clustered indexes it needs to maintain, regardless of partitioning etc.

    You haven't mentioned here if you are partitioning to different file groups here (and physical location of these files)?

    One really basic thing you can try is drop the non clustered indexes before the update and add them again afterwards, this surprisingly can make a massive difference in time.

    On the matter of a "better" index I have two points, firstly thats really subjective and without knowing what the table is used for its impossible to comment. Secondly IMHO you're thinking about it in reverse (i.e. how easy/difficult will this be to maintain) rather than "What will this index give me in terms of query performance and is that gain worth the cost of maintaining the index"

    It sounds like you should drop that index but then the question you need to ask yourself is do you actually need to add another one? If its was always doing a scan of that index then I'd guess you wont lose a lot by just letting the plans scan the clustered index instead.

    Another thing, depending on how available your data needs to be, you could consider "queuing" up those 5 second operations into batches, cutting down the number of operations.

    Obviously anything I've said here comes with the caveat that I don't know your system so I could be well off the mark.

    Let me know how it goes 🙂

    Cheers

    Dave

    somedude0930938408 (6/9/2011)


    Hi there,

    I have have an index that becomes very expensive to maintain and is causing performance issues. Background below:

    Table t_Orders is partitioned on OrderGroup which can contain one of 20 values (1-20). It has a unique clustered index on OrderGroup, OrderCode which is alligned with the partitioning scheme. There are two non-clustered, non-unique indexes. OrderGroup, ProductID and OrderGroup, CustomerID which are also partition alligned.

    t_Orders is a subset of a larger table t_OrderHistory which contains an audit of the history of all orders. Every 5 seconds or so our application comes along and merges the changes from t_OrderHistory into t_Orders using a stored procedure. The size of the table t_Orders remains at ~150k rows but more than 1.5m rows are inserted and then subsequently deleted throughout the course of the day. I should also note that around 85% of these 1.5m changes go through 2 or the 20 partitions, so the table is also imbalanced in that way.

    The level of selectivity in the problematic index (OrderGroup, CustomerID) is low and unbalanced. There are around 300 unique CustomerIDs but the top 10 account for ~90% of the volume passiing through in the way of inserts and deletes.

    When examining the cached execution plan for the stored procedure that is performing badly when we experience the issue, it is apparent that maintaining this index is by far the most expensive operation taking place in the batch. When the issue is seen it is often some way into the day so one can imagine a large number of entries have been inserted and then deleted from this index by that time.

    When stress testing this index in a dev environment I can see that the index grows large (thousands of pages). When looking at the index operational stats dmv, the vast number of records that are removed are marked for deletion (ghost records) but not physically deleted from the index, so it appears that it bloats and becomes very fragmented because of the nature of the way we use this table.

    This index is not utilised very well, any calls to it have been proven to only use it in scan mode. In order to alleviate the performance problem we are considering just dropping it.

    I'm exploring the possibility of a better index and certainly CustomerID should appear as the first column so sql server can gain better stats and hopefully utilise it in seek mode. I'm struggling to understand how I can design my index to be more efficient for sql server to maintain in terms of inserting and deleting. It has been suggested we use CustomerID, OrderCode on the partition scheme making it more unique. It's my understanding that if I create an index on CustomerID alone then it will have the clustered index key (OrderGroup, OrderCode) stored at the leaf level anyway. So, my questions are:

    What is the difference between the following indexes in the scenario of my partitioned table indexing on the partitioning scheme:

    1. CustomerID

    2. CustomerID, OrderCode

    Are they equivalent? Which would be the cheapest to maintain bearing in mind the volume of change that gets cycled through this table each day.

    Can anybody point me in the direction of some further reading? Or suggest what a better choice for indexing this table may be? Would any index using CustomerID be expenisve to maintain?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply