Table Indexing

  • I have a tuning question and I don't see a difference between two techniques.

    I have a table with the structure

    (ID Guid

    FK1 Guid

    FK2 Guid

    FK3 Guid

    Priority Bit)

    The ID in this table is simply for replication and is assigned it's value from the NEWID function.

    FK1,FK2,FK3 are the unique columns in this table.

    I currently have ID as a non clustered primary key for the table and the FK1,FK2,FK3 columns as the unique clustered index for the table.

    The queries running against the table don't always include all three columns in the clustered index which is causing an index scan that I'm trying to get rid of. Inserts and Deletes are only a few per day and the table has less than 20,000 rows.

    Does anyone know if the way the table is currently indexed is the best approach vs the following:

    ID becomes the CLUSTERED PRIMARY KEY

    FK1,FK2,FK3 become a unique non clustered index that include ID and Priority

  • In your current implementation neither of those are great alternatives. By using NEWID(), you are causing page splits (if that was the CI key column) due to the random nature of the NEWID() function. This was solved with the NEWSEQUENTIALID() function, that creates a UNIQUEIDENTIFIER greater than any on the particular machine (since last restart). This alleviates the page split/random insert impact.

    Also because of the same idea above having FK1, FK2, and FK3 as a composite for the clustered index is also going to cause sporadic inserts, as the clustered index is going to be sorted by those key columns.

    What I'd do is change NEWID() to NEWSEQUENTIALID(), and make ID the clustered index key column.



    Twitter: @SQLife
    Email: sqlsalt(at)outlook(dot)com

  • aballard (4/24/2012)


    I have a tuning question and I don't see a difference between two techniques.

    I have a table with the structure

    (ID Guid

    FK1 Guid

    FK2 Guid

    FK3 Guid

    Priority Bit)

    The ID in this table is simply for replication and is assigned it's value from the NEWID function.

    FK1,FK2,FK3 are the unique columns in this table.

    I currently have ID as a non clustered primary key for the table and the FK1,FK2,FK3 columns as the unique clustered index for the table.

    The queries running against the table don't always include all three columns in the clustered index which is causing an index scan that I'm trying to get rid of. Inserts and Deletes are only a few per day and the table has less than 20,000 rows.

    Does anyone know if the way the table is currently indexed is the best approach vs the following:

    ID becomes the CLUSTERED PRIMARY KEY

    FK1,FK2,FK3 become a unique non clustered index that include ID and Priority

    You're approaching your indexing from what sounds like the wrong perspective.

    A non-clustered primary key won't cause the page splits that Thomas is concerned about above, and sounds like the best usage here... maybe.

    To decide your clustering and non-clustered index approaches, it's not simply a matter of the table. It's how does the table fit into the whole? Is this the core table in a snowflake? Is it a mid-point hierarchy table? How do you approach data in this table with where clauses, and what uses this table as a join once it's restricted... and how does it join to this table?

    Indexing relies on knowing the data's use in the whole, not construction of it uniquely. With more usage information we can help you find your best pattern of consutruction.

    A note on NEWID vs NEWSEQUENTIALID. Look into something called 'hot spotting' in relation to clustered indexes. Under most circumstances you can avoid significant page splitting by using appropriate FILLFACTOR values, and maintenance will help to avoid the rest. This will of course depend on your system and this table's load for modification vs. reads and if it's better to truly have incredibly tight pages or if a little give on those pages, causing more reads, will give you much better throughput compared to allowing for internal data mechanics to have room to breathe.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Right, and I agree. But if the OP was to make the clustered index out of ID with NEWID() then it would be that issue. Likewise, I doubt the combination of FK1, FK2, and FK3 is sequential.



    Twitter: @SQLife
    Email: sqlsalt(at)outlook(dot)com

  • Thank you both for your responses. I misspoke on the ID using the NEWID it is actually using the NEWSEQUENTIALID as the default value. For the foreign key columns FK1 and FK3, they are sequential as their parent tables uses the newsequentialid as the default. The FK2 column is not sequential but it is only has 540 possible values and while that does grow it only grows by 64 new possible items every three months.

    FK1 has 30 distinct items in it (Is Sequential)

    FK2 has 87 distinct items in it (Is Not Sequential)

    FK3 has 7620 distinct items in it (Is Sequential and the one FK out of the three that is related to the content table so it will continue to grow in number of unique values )

    Craig,

    To answer your questions about the use of this table. It's primary purpose is to filter down items from the content table that it inner joins with on the FK3 column and the where clause is done on the FK1 and FK2 column as well as additional columns in the content table. There is a foreign key constraint that is defined and enabled between the two tables. Changes to this reference table are extremely low and (Inserted + Updated + Deleted) rows are at most 20 per day. The cardinality of this table in relation to the content table is 0 to Many as a row in the reference table is not required.

    Sample statement:

    SELECT m.* FROM dbo.t_main m INNER JOIN dbo.t_main_ref ref on (m.id = ref.mainid) WHERE (ref.FK1 = @Condition1) and (ref.FK2 = @Condition2) and (m.Column5 = @Condition3) and (m.Column6 < GETDATE())

  • Agree with what everyone said regarding the clustered index on the ID column. I don't think it's the way to go.

    As far as getting index scans goes, it really comes down to your query. If you have an index that is FK1, FK2, FK3, then any query that references FK1 or FK1 & FK2, or all three, should use that index. It's when you only reference FK2 or FK3 or a combination of these two that the index will have to be scanned. All that assumes that FK1 is reasonably selective since it's the leading edge of the index. Generally, but not always, you'd want more selective columns on that leading edge. Also, if the values being passed incompass more than about 1% of the data in the table you might be seeing scans too, regardless of what the index is on.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • aballard (4/24/2012)


    SELECT m.* FROM dbo.t_main m INNER JOIN dbo.t_main_ref ref on (m.id = ref.mainid) WHERE (ref.FK1 = @Condition1) and (ref.FK2 = @Condition2) and (m.Column5 = @Condition3) and (m.Column6 < GETDATE())

    The clarification helps. Can I assume the above definition is for t_main_ref then? It seems a bit off because of the swaps of mainid vs. id fields in the connection.

    In that case, I'd usually cluster the index on FK1 and 2, but don't expect much out of it with the selectivity you mention. It doesn't sound like it'll be high enough to really make a difference, as Grant explained above.

    However, depending on the selectivity of Col5/Col6 in t_main, you might be better off with a clustered on mainID and allowing the selectivity of the other table to seek this one more efficiently.

    Cardinality has huge implications when dealing with indexes. It seems like this is being used more as a subtable than a table, at least for this query (without names it's hard to really nail down). Because of that you have to examine where your highest restrictions come from and apply them from there out to connecting indexes to the other tables... usually.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig,

    You are correct in that the main_ref table is being used to getting a subset of records from table main. The main_ref table is one of several tables that we are using to relate the information in table main to other entities in the database.

    I've reindexed the ref table with a unique cluster on the id and three other indexes one for the unique composite key of all three FKs, one on the FK3 from table main that includes the other columns in the table and one on FK1 and FK2 as they are used together in where criteria. Selecting on FK1 and FK2 is done a majority of the time, selecting on FK3 only happens when the specific item from the table main returned to provide a list of all of the entities this item is related to.

    Doing this has changed the scan to a seek and the number of estimated rows are just about the same as the actual number of rows so like you said not much for selectivity for the select criteria on FK1 and FK2 but the improvement that I see is less scans and logical reads from the other tables in the select statement.

    Does anyone think that doing a non unique cluster on just FK3 will provide any improvement as it's not used in the where clause for most statements on the tables only as part of the join statement?

    Thanks for all of the help everyone.

  • Just so you are aware, since your clustered index is on FK1, FK2, FK3; when you get a clustered index scan, you are actually scanning the table.

  • If FK3 is used in queries where the other two columns are not referenced in either JOIN criteria or WHERE clauses, then yeah, it might benefit from a nonclustered index. Otherwise, probably not.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks everyone. I've tried the different ideas and run them against the queries to see what produces the best result.

    From all of the testing this has proven to be the best based on the current queries. The ID column is now set as just a nonclustered primary key. I've put a unique clustered index on FK1, FK2 and FK3 as most of the queries are using FK1 and FK2 in the where clause and FK3 makes the combination unique. I've put an additional index on FK3 and included the other columns in the index so it doesn't have to look up any of the other information to cover the remaining queries. This has removed the SCAN and given me a very low read count which is what I was looking for.

    Thanks again.

  • You are just verifying the read performance & ignoring write performance. If it’s going to be OLTP transaction table, your DMLs would suffer. Plus, clustered index on 3 GUID columns will result in frequent fragmentations.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply