SQL Server Black Book on Indexes & Statistics, Got One?

  • Right now at my workplace we are dealing with some issues that all seem to tie back to Indexes and their fragmentation. I thought I had a proper handling on Indexes but recent events and a review of various postings and Books On Line items have me asking many questions.

    Is there a definitive Black Book like resource for the final word on Indexes in SQL Server 2005?

    Here's an example of the questions recently raised:

    The system uses Statistics to determine fragmentation and when you rebuild an index there is no need to rebuild the statistics as SQL Server handles that for you. However do statistics have an effect on the Rebuild process? If I run the command ALTER INDEX and tell SQL Server to Rebuild the index, does it matter if the Statistics for that Index are %99.99 accurate or even %100 inaccurate? WOuld the INdex Rebuild work properly even if there were no Statistics?

    We've recently ran the ALTER INDEX REBUILD T-SQL command and found that shome of the Indexes to have become very fragmented in very little time after the rebuild. This leads me to ask if the Rebuild is not working as it should have and is perhaps beeing skewed from working effectively because of bad statistics.

    Any thoughts/comments or suggestions?

    Thanks in Advance

    Kindest Regards,

    Just say No to Facebook!
  • I haven't gotten to dig into this much but here's a link:

    http://sqlcat.com/[/url]

  • YSLGuru (9/22/2008)


    The system uses Statistics to determine fragmentation and when you rebuild an index there is no need to rebuild the statistics as SQL Server handles that for you. However do statistics have an effect on the Rebuild process? If I run the command ALTER INDEX and tell SQL Server to Rebuild the index, does it matter if the Statistics for that Index are %99.99 accurate or even %100 inaccurate? WOuld the INdex Rebuild work properly even if there were no Statistics?

    Statistics are not used to determine fragmentation. Stats are used by the query optimiser to estimate the number of rows that will be affected by a query.

    Index rebuilds completely recreate the index. There's no need to estimate the rows that will be affected. It's the entire table. Stats are irrelevant to a rebuild

    We've recently ran the ALTER INDEX REBUILD T-SQL command and found that shome of the Indexes to have become very fragmented in very little time after the rebuild. This leads me to ask if the Rebuild is not working as it should have and is perhaps beeing skewed from working effectively because of bad statistics.

    Uniqueidentifier with a default of NewID() as the index key by any chance?

    Some data types and patterns cause fragmentation faster than others. It's also affected by the fill factor on your indexes. If the column that is used as the index key is ever-increasing (like identity or date inserted) and there are no updates that change the row size, then the index will fragment very slowly, if at all.

    If the index key is on a column that has a random value and hence inserts are occurring all over the index or there are frequent updates that increase the size of the row, then there will be frequent page splits on that index and the fragmentation will increase rapidly.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • kalen Delaney wrote some of the best consise books on indexing prior to sql 2005. From sql2005 onwards you seem to need more and more books just to pick the bits you might need from all the repeated basics.

    Kimberley Tripp also does some excellent work with indexes - check out both their blogs for some great posts on indexes.

    Indexing is a massive area and I'm not really sure where to suggest you start. I've blogged a number of index posts and have a series of pages on my website ( http://www.grumpyolddba.co.uk ) but I'm not sure what I've written will help too much. I do have some more to post in the next couple of weeks.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • colin Leversuch-Roberts (9/22/2008)


    kalen Delaney wrote some of the best consise books on indexing prior to sql 2005. From sql2005 onwards you seem to need more and more books just to pick the bits you might need from all the repeated basics.

    Her last book in the Inside SQL 2005 series - Query Tuning and Optimisation is a good one to pick up for info on this, and other factors around well-running queries

    I do have some more to post in the next couple of weeks.

    Likewise. I must have at least 6 half-written posts on various indexing topics

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/22/2008)

    Statistics are not used to determine fragmentation. Stats are used by the query optimiser to estimate the number of rows that will be affected by a query.

    Index rebuilds completely recreate the index. There's no need to estimate the rows that will be affected. It's the entire table. Stats are irrelevant to a rebuild

    Perfect, exactly the answer I was looking for and hopng for as well!

    Uniqueidentifier with a default of NewID() as the index key by any chance?

    Not in the case of the 3 tables I am currently investigating. However I can't guarantee that another table isn't setup with this data type. The database has 300+ tables and I have not yet had time to dig into all of them. Is there an easy way to ask SSMS to locate any column in any able of a database that is setup with the data type unique identifier?

    Some data types and patterns cause fragmentation faster than others. It's also affected by the fill factor on your indexes. If the column that is used as the index key is ever-increasing (like identity or date inserted) and there are no updates that change the row size, then the index will fragment very slowly, if at all.

    If the index key is on a column that has a random value and hence inserts are occurring all over the index or there are frequent updates that increase the size of the row, then there will be frequent page splits on that index and the fragmentation will increase rapidly.

    It's very interesting, even ironic, that you'd comment about this because the most volatile table in the database just happens to be setup with a PK (primary Key) that works exactly like this. The PK is set up as a Numeric value, an integer with 18 0's and so it can store fairly large key values. The key is incremental but within subsets.

    I tried several times to explain this but am unable to make the description brief so long story short, this table works like this. The PK column is used to assign the transaction a control number. But since the table contains multiple types of transaction and because each transaction type has it's own contiguous set of Control Numbers that start at 1 and increment by 1, the link between the PK value and this control number the user sees is not 1 to . You can't have a Charge transaction with a Control Number of 1 and a Receipt transaction with a Control number of 1 if the control number is a 1 to 1 reference to the PK column because SQL Server will allow only 1 of these to use the number 1 for it's PK value.

    To work around this and maintain a releationship between the PK value and the transactions control number, the sofwtare vendor setup the PK value to start at a base number, which is different for each type of transaction, and increment from that base by 1 for each new transaction inserted of that type.

    As an example the first Charge created would have the Control number 1 but it's PK value would be the base number for Charges + the control code number. If the base number for Charge transactions is 1000000000 then the PK value for that Charge would be 1000000000+1 or 1000000001.

    Make sense? I know this isn't proper Normalization, or even close to it but it's how it was designed and we have to deal with it.

    WIth this out of order insertion of PK values and because the PK column is a Clustered Index, the thing quickly becomes fragmented depending on the time of the month. This table is the most important one in all of the database and it is of course the largest in terms of rows of data and in size over all. It also results in the slowest queries.

    On top of this, many of the INSERTS are handled thru Stored Procedures that use Cursors like they are going out of style.

    Kindest Regards,

    Just say No to Facebook!
  • YSLGuru (9/23/2008)


    Is there an easy way to ask SSMS to locate any column in any able of a database that is setup with the data type unique identifier?

    Why not ask SQL itself? 😉

    select object_name(c.object_id), c.name

    from sys.columns c inner join sys.types t on c.user_type_id = t.user_type_id

    where t.name = 'uniqueidentifier'

    As an example the first Charge created would have the Control number 1 but it's PK value would be the base number for Charges + the control code number. If the base number for Charge transactions is 1000000000 then the PK value for that Charge would be 1000000000+1 or 1000000001.

    Make sense? I know this isn't proper Normalization, or even close to it but it's how it was designed and we have to deal with it.

    WIth this out of order insertion of PK values and because the PK column is a Clustered Index, the thing quickly becomes fragmented depending on the time of the month. This table is the most important one in all of the database and it is of course the largest in terms of rows of data and in size over all. It also results in the slowest queries.

    Hmmm... Have you considered table partitioning? Partition on the base numbers for the transactions and then, under the covers, it will be like each transaction type has its own table. Your fragmentation problem goes away, queries (if they query on only one transaction at a time) can be faster as they can ignore entire partitions and, best of all, it still looks like one table to all queries.

    It won't help with the cursors though.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hmmm... Have you considered table partitioning? Partition on the base numbers for the transactions and then, under the covers, it will be like each transaction type has its own table. Your fragmentation problem goes away, queries (if they query on only one transaction at a time) can be faster as they can ignore entire partitions and, best of all, it still looks like one table to all queries.

    It won't help with the cursors though.

    Talk about table partioning has been tossed around but unfortunately we have to be very, very careful about any kind of schema changes to anything created by the vendor. Adding a custom table to do something specific to our needs is OK but modifying anything that is part of the vendors core DB is another story. Another user of this same software has at least attempted partioning but I never heard what the outcome was from their attempts.

    I know a former employee of this sofwtare vendor and he told me that for a long time he emphasized the importance of doing something about this table; stressing how it would one day come to a point where something had to be done else it would collapse under it's own weight. This vendor is just as of recent, in the past year, started dealing with users with DB's measuring in the 100GB range. Up till now the really large DB's were in the 20-30GB range and some things that would work for those are not working or at least not working so well in the VLDB (Very Large DB's).

    As for the cursors, they can really amke one want to curse 🙂 This is what happens when you let procedural coders with no formal DB training or experience make DB design decissions.

    Thanks for the info & help. Great tip on searching for a column with that data type. It did return some rows of data but not for any tables in the database in question.

    Thanks again

    Kindest Regards,

    Just say No to Facebook!
  • hmm I did part of my SQL Bits presentation about choice of clustered index - I will be posting this somewhat complex area shortly. It's too lengthy to put in a forum - I have been working on this exact problem of fragmenting tables.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I'd second the vote for Inside SQL Server - Query Tuning.

    Colin,

    If you want to post it here, we'd love to publish it on the site.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply