Index Fragmentation/Identity Property

  • Index fragmentation can be high because the fillfactor is set to 100% correct?

    I had dealt with system where they have very wide cluster keys (I am not allowed to change them) :(. Anyhow all indexes were set to 100% fill factor I changed the fill factor to 75% and I have not have had major fragmentation. I only have to reorganize once a week and since implementing fill factor have not had to rebuild yet. Fragmentation has never gone above 5% in one week to far, before that it was going up to 40%+. :ermm:

    Thanks...

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Mohit (3/11/2009)


    Index fragmentation can be high because the fillfactor is set to 100% correct?

    I had dealt with system where they have very wide cluster keys (I am not allowed to change them) :(. Anyhow all indexes were set to 100% fill factor I changed the fill factor to 75% and I have not have had major fragmentation. I only have to reorganize once a week and since implementing fill factor have not had to rebuild yet. Fragmentation has never gone above 5% in one week to far, before that it was going up to 40%+. :ermm:

    Thanks...

    We have a table with an index on Customer Surname. It contains hundereds of thousands or millions of rows. A small number of thousands of new customers are typically added each day.

    In production running, with a fill factor or 100% it took 5 hours to reach 40% avg_fragmentation_in_percent.

    With a fill factor of 95% the index was last rebuilt a month ago and has only reached 6% avg_fragmentation_in_percent in that time.

    The same database is implemented for a number of clients and the results are consistent.

    Although we know the theoretical effect of a fill factor that is too high I was surprised by level of impact this actually has in the real world.

    .

  • Dave Mason (3/10/2009)


    GilaMonster (3/10/2009)


    Dave Mason (3/10/2009)


    Oh, as far as the fragmentation threshold goes, it was 3% (or more) this morning. Prior to that, it was 5%.

    I wouldn't worry until it reaches 25-30. Fragmentation's mainly an issue with range scans, not seeks. If many of your queries do full table scans or large range scans, then maybe worry.

    What's the schema of this table, and what does a typical row look like before and after one of those 200 updates?

    Are you sure there's no shrink job anywhere? If you use maintenance plans, check that they don't have the 'release unused space' task.

    Normally, I wouldn't worry about the fragmentation until it reaches 15% (seems like I read about that threshold in some MS documentation somewhere). My performance barometer is a stored proc used for searches run from our main application. Most of the time, it runs in 5 ms or less. Sometimes, the main application has timeout exceptions. When I run a trace, I can see the Duration, Reads, and CPU spike. At that point, I'll rebuild the indexes manually. Afterwards, the application time outs cease and things return to normal. I've also tried updating statistics instead of rebuilding indexes, but the results are mixed.

    As for shrinking the db, it's a fairly new instance of Sql Server 2005. There are no maintenance plans. The only Sql jobs were created by me. I suppose there could be some rogue process doing the shrinking (for instance, a Windows scheduled task on a different machine).

    I ran a quick test and issued this command:

    DBCC SHRINKDATABASE (MyTestDbName)

    However, I didn't see anything in the sql logs to tell me that the db was shrunk (in SSMS, expand the Management node, expand the SQL Server Logs node, and select the first log item). Is there another mechanism that will tell me when the db was last shrunk?

    To see whether a shrink has occurred in look in the directory that your SQL Logs (the text logs you see though SSMS) are created in (often C:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG)

    In here you'll find trace files that SQL creates automatically. They roll over when full so you need to find the corrct one. By default they have expand and shrink events recorded in them, so you can check if / when they occurred.

    .

  • t.walker (3/11/2009)


    Although we know the theoretical effect of a fill factor that is too high I was surprised by level of impact this actually has in the real world.

    Funny how that works ;-).. Yaa on my fill factor varies on different table of different sizes. And it has been working for me; I asked the developer in one week period how much of the data gets changed/what gets added and based my index fill factor on that. If the changes were high I also considered pad_index factor.

    Thanks.

    Mohit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • I did some further testing. I dumped a table into a temp db and re-created all of the indexes, including the PK, which is clustered. It's on a single column with the IDENTITY property. I rebuilt all of the indexes. sys.dm_db_index_physical_stats showed less than 1% fragmentation on all indexes of the table.

    I updated approx 200 rows in the table, mimicking what happened in production. sys.dm_db_index_physical_stats showed no appreciable change in fragmentation for any of the indexes. Then I updated a VARCHAR(50) column to SPACE(50), but only if it was NULL. This affected 1.4 million rows. sys.dm_db_index_physical_stats now showed about 75% fragmentation for the PK index.

    Keep in mind, no FILLFACTOR was specified for the PK. I dropped the table, recreated it, and recreated the indexes again. This time I specified a FILLFACTOR of 80. Updating the table as before led to fragmentation of 99%. Huh? I expected the fragmentation to be lower, but instead, it increased.

    Another attempt with a FILLFACTOR of 60 led to fragmentation of less than 1% (basically, no change). Changing a VARCHAR(50) column from NULL to 50 characters is extreme and not likely in real world situations. So I decided to use a FILLFACTOR of 75 for production. I'll have to keep an eye on the table and its PK to see if the fragmentation problem persists.

    Conclusions:

    I still cannot explain how updates to 200 rows in a table of 1.5 million rows would lead to 25% fragmentation of the Primary Key index.

    However, I did misinterpret the MS documentation on clustered indexes. I thought that the leaf nodes of a clustered index contained the column data for the indexed column(s) only. After my tests (and a re-read of the MS documentation), it appears that all column data for the entire row is stored in the same page with the data for the clustering key. For the database in question, the vast majority of the tables have a clustered index. I'll have to assess a proper FILLFACTOR for each Primary Key.

  • Yes Cluster Index leaf node has the data; another characterisc about cluster indexes is it physical sorts the data by the index key. So sometime people don't like creating cluster key on the PK because other fields are sorted on more often then the PK.

    Just as example; but havint cluster index = ++ :). Just need to be caution that it should be narrow as possible...

    As cluster key also gets attached to each non-cluster index in the table... have a read at Kim's article here 🙂

    Ref: http://sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Just bear in mind that by setting a fill factor to 75%, your table is now taking a fair bit more pages. This means more space on disk, more space in memory and more time to query it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Dave,

    There is something in your figures I still can't understand here. You are right, they don't stack up.

    How do you know you are updating only a few hundred rows? You don't mean a few hundred update statements do you, potentially affecting more than one row each?

    There is something happening here that is affecting more rows than you think. Its the only way to explain your fragmentation figure. Can you use the profiler to identify what it is?

    Having said this I don't understand why you tried changing the column type in your test. I'm not surprised this caused odd fragmentation results as I think the page splits are handled differently when the schema changes as opposed to when rows are updated. Did you try this test because you suspect this is what is actually happening.

    Finally did you confirm whether the database had shrunk somehow?

    I'd be a bit wary about just using smaller fill factors if you don't fully understand why the fragmentation occurs in the first place.

    Is the clustered index defined as unique by the way?

    .

  • t.walker (3/12/2009)


    Dave,

    There is something in your figures I still can't understand here. You are right, they don't stack up.

    How do you know you are updating only a few hundred rows? You don't mean a few hundred update statements do you, potentially affecting more than one row each?

    There is something happening here that is affecting more rows than you think. Its the only way to explain your fragmentation figure. Can you use the profiler to identify what it is?

    Having said this I don't understand why you tried changing the column type in your test. I'm not surprised this caused odd fragmentation results as I think the page splits are handled differently when the schema changes as opposed to when rows are updated. Did you try this test because you suspect this is what is actually happening.

    Finally did you confirm whether the database had shrunk somehow?

    I'd be a bit wary about just using smaller fill factors if you don't fully understand why the fragmentation occurs in the first place.

    Is the clustered index defined as unique by the way?

    This is (conceptually) the table definition:

    CREATE TABLE myTable (

    myTableId INT IDENTITY

    CONSTRAINT PK_myTable PRIMARY KEY CLUSTERED,

    Col_1 ,

    Col_2 ,

    Col_3 ,

    ....

    Col_N ,

    LastChangedDate DATETIME NOT NULL

    )

    The LastChangedDate is set to GETDATE() every time a row is updated.

    There is also an update trigger on the table. It inserts everything from the deleted logical table to a corresponding history table.

    So...I know that approx 200 rows were updated in a day by filtering on the LastChangedDate.

    Matching those records to the history table allows me to calculate the number of updates (some rows are updated once, others multiple times).

    I'm not sure why you are asking about schema changes/"changing the column type". The table definition has not changed. One of the columns is (and always has been) VARCHAR(50). Since most of the rows showed a NULL value for the column, I ran an update to set the data values to the columns's maximum width (I used a string of 50 spaces). Why? Because I wanted to see if I could recreate at least one scenario that would fragment the clustered PK index. Are the updates in production the culprit? I can't say yes with 100% certainty. However, it's been more than 24 hours since setting the FILLFACTOR to 75 and doing a rebuild of the PK index on the table. As of this post, it is fragmented by a mere 0.02%. I feel confident that I'm on the right track. 🙂

    No, I am not able to confirm whether the db was shrunk or not. I shrunk a couple of nonproduction databases myself. One I shrank manually via the SSMS GUI. The other I shrank from the command line via a DBCC command. Neither activity appears in the SQL Server logs. As it stands right now, I do not know of a way to determine the date of the last db shrink. I'll research this some more...

    Yes, the clustered index is unique (it's a Primary Key).

  • GilaMonster (3/11/2009)


    Just bear in mind that by setting a fill factor to 75%, your table is now taking a fair bit more pages. This means more space on disk, more space in memory and more time to query it.

    Yeah, I can see a clear trade-off here. It's only been a day and 75% FILLFACTOR has eliminated the need to do additional index rebuilds throughout the day. I'll give it another few days and see if I can get by with a higher number (80, 85, 90).

    Gail, one of you earlier replies got me headed in the right direction. Thanks! 🙂

    And thanks to everyone else for your input. It's much appreciated.

  • Dave,

    OK .. I can see how you know how many updates.

    Sorry about the schema point .. My mistake in misreading what you did!

    To see whether a shrink has occurred in look in the directory that your SQL Logs (the text logs you see though SSMS) are created in (often C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG)

    In here you'll find TRACE files (*.trc) that SQL creates automatically. They roll over when full so you need to find the correct one. By default they have expand and shrink events recorded in them, so you can check if / when they occurred.

    Are you sure the trigger isn't doing more updates than you think? Updating something to its own current value is still an update for example.

    Sounds like you are happy with the solution you have though.

    .

  • t.walker (3/12/2009)


    Dave,

    OK .. I can see how you know how many updates.

    Sorry about the schema point .. My mistake in misreading what you did!

    To see whether a shrink has occurred in look in the directory that your SQL Logs (the text logs you see though SSMS) are created in (often C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG)

    In here you'll find TRACE files (*.trc) that SQL creates automatically. They roll over when full so you need to find the correct one. By default they have expand and shrink events recorded in them, so you can check if / when they occurred.

    Are you sure the trigger isn't doing more updates than you think? Updating something to its own current value is still an update for example.

    Sounds like you are happy with the solution you have though.

    Sweet! I found it. (I was looking in the sql log files before, not the .trc files.)

    In short, no, the production database has not been shrunk. Well, at least not since Feb 25th. That's the oldest entry in the trace.

    Thanks for the tip.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply