Simple Index Question

  • I've a database with two tables.

    The two tables have a one to many relationship.

    Table 1 has a column [ID] which is of type uniqueidentifier as it's primary key.

    Table 2 has a column [ID] which is of type uniqueidentifier as it's primary key.

    Table 2 has a column [ParentID] which is of type uniqueidentifier and it acts a foreign key relating back to Table 1. That is, Table1.ID = Table2.ParentID.

    In both tables the [ID] column is a clustered index.

    In Table 2 the [ParentID] is a non-clustered index.

    Both tables receive frequent updates, deletes, and inserts.

    Given what I've read, this isn't the best indexing solution.

    From what I understand the following changes would be recommended:

    1: Table 1 has no reason for the [ID] column to have a clustered index because the value is never going to repeat so there is no reason to cluster them together. Some resources would be saved by dropping this back to a simple non-clustered index.

    2: Table 2 has no reason for the [ID] column to have a clustered index because the value is never going to repeat so there is no reason to cluster them together. Some resources would be saved by dropping this back to a simple non-clustered index.

    3: Table 3 should have a clustered index on [ParentID] since this foreign key may be repeated several times (one to many relationship).

    Is this analysis correct?

  • Generally speaking you are right in your analysis. But bear in mind that every table should have a clustered index. So if there isn't a need for another column in the first table to have a clustered index you might as well leave the clustered index on the ID column.

    The other thing you'd need to consider is that you'll probably get more page splits by putting the clustered index on the foreign key column so you'll need to account for that with a bigger fillfactor and probably re-indexing or defragging more regularly. Then again you'd get lots of page splits with the clustered index on the ID columns.

    I should point out that using GUIDs for primary key columns and placing indexes on GUIDs isn't ideal.

  • If you were to place a clustered index on ParentID you would get more page splits and have slower inserts/updates because the entry would have to be inserted into the correct spot in the index whihc necessitates moving data around. For example if your data is: A,A,A,C,C,C,C,D,E,H,H,H,I,I,J,K,L,M and you insert a B all the records following the B need to be shifted in order to maintain the clustered index. This is why a clustered index for insert/update purposes works best on an increasing value like dates.

  • I'm not sure I buy the analysis. Since the value of the clustered index is what is stored in the lower level of all of the NON-clustered indexes, something unique or highly selective is/can be desirable in the clustered index. Since the clustered index determines the physical storage order of the table, you also want a cluster key that is immutable, since any changes to those key values would cause portions of the table to be re-written, page splits, etc....

    Yes - clustered indexes are good for range searches, but you need to weigh all of the attributes and consequences. As of right now - you have yet to mention any actual range searches, so changing the clustered index to parentid isn't compelling (yet).

    Without any further info - just to save on all of the page splits - I would actually see keeping the ID's as the UCI, and adding a non-clustered index on ParentID. Starting from there, I would then start weighing the other factors (such as large amounts of queries based on RANGE queries against ParentID for example), to perhaps change my starting position.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thank you for the response.

    A couple of follow up questions, if you don't mind.

    1: Why should each table have a clustered index?

    2: I could move the clustered index off the unique identifier (each of my table has two keys) and onto an autonumber field. I'll do that. 🙂

    (SIDE NOTE: The reason I've two keys is because all of my applications use objects. Using a unique identifier (GUID) for each object allows me to build complex object relationships in memory which may then be written to the database after work is complete. This protects me from concurrency issues and allows stuff to be build/modified/discarded without ever contacting the database to retrieve the value from an autonumber field)

  • FYI: Range searches never happen on these IDs.

  • Yes - clustered indexes are good for range searches, but you need to weigh all of the attributes and consequences. As of right now - you have yet to mention any actual range searches, so changing the clustered index to parentid isn't compelling (yet).

    Ah... so if I had the ParentID of 'X' showing up in the table 8 times the clustered wouldn't help much unless I was looking for 'X' through 'Y'. Otherwise, a regular index would find all instances of 'X' just as quickly?

  • thelenj (4/16/2008)


    Yes - clustered indexes are good for range searches, but you need to weigh all of the attributes and consequences. As of right now - you have yet to mention any actual range searches, so changing the clustered index to parentid isn't compelling (yet).

    Ah... so if I had the ParentID of 'X' showing up in the table 8 times the clustered wouldn't help much unless I was looking for 'X' through 'Y'. Otherwise, a regular index would find all instances of 'X' just as quickly?

    It will help substantially if you are comparing it to not having an index. however, a non-clustered index (covering) could give it a run for its money, and beat it on specific seeks. That's because the NCI can have more values in its data pages, so you end up pulling less data to scan or seek some NCI's (assuming they're narrow of course).

    The reason it's so good at range searches is that all of the data you want (meaning - in the range) is bunched together. So you can seek for the bottom of the range, then keep reading contiguous data pages until you get to the top of the range.

    In other words - it's about the same (or just a little less) than a NCI on specific seeks, but it's substantially better at ranges than anything else.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4/16/2008)

    It will help substantially if you are comparing it to not having an index. however, a non-clustered index (covering) could give it a run for its money, and beat it on specific seeks. That's because the NCI can have more values in its data pages, so you end up pulling less data to scan or seek some NCI's (assuming they're narrow of course).

    The reason it's so good at range searches is that all of the data you want (meaning - in the range) is bunched together. So you can seek for the bottom of the range, then keep reading contiguous data pages until you get to the top of the range.

    In other words - it's about the same (or just a little less) than a NCI on specific seeks, but it's substantially better at ranges than anything else.....

    This is why I would have thought a clustered on the ParentId would be better. You're never going to do a range search on the ID column and any searches on the ID column will be for specific values. In this instance, as you mentioned, an NCI is just as good as a CI for specific seeks. Also, because the ParentID column is not unique, an NCI won't do much good - ideally for an NCI you want a column that has high uniqueness (above 90%). Whereas a CI is very good at finding data across a non-unique column.

    Obviously this doesn't beat actually analysing the problem with some practical data and seeing what works best for you.

    My biggest concern with this example is the inserts/updates/deletes and how that will affect the indexes.

  • SQLZ (4/16/2008)


    Matt Miller (4/16/2008)

    It will help substantially if you are comparing it to not having an index. however, a non-clustered index (covering) could give it a run for its money, and beat it on specific seeks. That's because the NCI can have more values in its data pages, so you end up pulling less data to scan or seek some NCI's (assuming they're narrow of course).

    The reason it's so good at range searches is that all of the data you want (meaning - in the range) is bunched together. So you can seek for the bottom of the range, then keep reading contiguous data pages until you get to the top of the range.

    In other words - it's about the same (or just a little less) than a NCI on specific seeks, but it's substantially better at ranges than anything else.....

    This is why I would have thought a clustered on the ParentId would be better. You're never going to do a range search on the ID column and any searches on the ID column will be for specific values. In this instance, as you mentioned, an NCI is just as good as a CI for specific seeks. Also, because the ParentID column is not unique, an NCI won't do much good - ideally for an NCI you want a column that has high uniqueness (above 90%). Whereas a CI is very good at finding data across a non-unique column.

    Obviously this doesn't beat actually analysing the problem with some practical data and seeing what works best for you.

    My biggest concern with this example is the inserts/updates/deletes and how that will affect the indexes.

    My biggest concern is that the page splits caused by using the ParentID would outweigh the benefits. Besides the fact that foreign keys can be changed, whereas ID's tend not to, so moving data in the table would tend not to happen.

    Besides - I was kind of suspecting that there were NO range searches, so using a unique value so that the clustered key is narrow (thus making the leaf level of the NCI's small) would be a good thing (remember that specifying a non-unique value for a cluster key means that SQL Server adds a unique id onto the end of it, making that key rather wide and all NCI's all that much bigger and less efficient). Continuous page splits and table fragmentation can quickly turn into an awful perf issue.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4/16/2008)

    Obviously this doesn't beat actually analysing the problem with some practical data and seeing what works best for you.

    My biggest concern with this example is the inserts/updates/deletes and how that will affect the indexes.

    My biggest concern is that the page splits caused by using the ParentID would outweigh the benefits. Besides the fact that foreign keys can be changed, whereas ID's tend not to, so moving data in the table would tend not to happen.

    Besides - I was kind of suspecting that there were NO range searches, so using a unique value so that the clustered key is narrow (thus making the leaf level of the NCI's small) would be a good thing (remember that specifying a non-unique value for a cluster key means that SQL Server adds a unique id onto the end of it, making that key rather wide and all NCI's all that much bigger and less efficient). Continuous page splits and table fragmentation can quickly turn into an awful perf issue.

    Yes, the page splits would be a cause for concern. Problem with the uniqueidentifier datatype for the ID column is that page splits will still be a cause for concern there as well. Good point about making a narrow clustered key.

  • Okay, if I've followed this so far......

    1: I should not put a clustered index on my ID field (uniqueidentifier) because it is probably causing a ridiculous number of page splits. Replace with a non-clustered index for both Table1 and Table2.

    2: I should put the clustered index on the Number field (autonumber field that acts as a key) for each table since it won't cause page splits and it would support ranges if I ever chose to use 'em.

    3: The ParentID column on Table2 should have a regular unclustered index.

    That is what I've gathered from this so far.

    If that isn't correct, shout about it now.

    Also, thanks for the discussion thus far.

    I'm learning... 😛

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply