Index Rebuild - How much and How Long?

  • Hi,

    I have 2 questions regarding index rebuild (DROP INDEX... & CREATE INDEX... OR DBCC DBREINDEX) in SQL Server 2000 SP3.

    1) Is there a way that I can tell how much like percentage of index has been rebuild while the rebuild command is still running?

    Something like with stat=10 just like the backup database...with stat=10?

    2) Currently our database is doing full index rebuild every 3-4 months and it is takes 8 hours. Let says we up the frequency of index rebuild to once every month.

    Are we expecting the time for the index rebuild each times will be less than 8 hours result of less index fragmentation because of frequent rebuild? OR

    Will it still be roughly the same amount of time (around 8 hours) every times index rebuild is fire (that means no dependency of severity of index fragmentation)?

    Appreciate some insight from expert.

  • One thing I would look at first is how you are rebuilding indexes.  If you are doing a drop and create, there is a specific order in which they need to be done.  If you are dropping clustered indexes before non-clustered indexes, the non-clustered indexes are getting rebuilt before being dropped.  And if you are creating the clustered indexes after the non-clustered indexes, again they are getting rebuilt again.

    For a drop and create: drop non-clustered indexes, then clustered index, then create the clustered index, then create all non-clustered indexes.

    There may be other factors to consider as well, such as foreign keys from /to other tables.

    Time wise, I don't think it matters much as the rebuilding of indexes still requires scanning the entire table.

  • Just a few things I look at.

    I have 2 questions regarding index rebuild (DROP INDEX... & CREATE INDEX... OR DBCC DBREINDEX) in SQL Server 2000 SP3.

    1) Is there a way that I can tell how much like percentage of index has been rebuild while the rebuild command is still running?

    Something like with stat=10 just like the backup database...with stat=10?

    Not to my knowledge.

    2) Currently our database is doing full index rebuild every 3-4 months and it is takes 8 hours. Let says we up the frequency of index rebuild to once every month.

    Are we expecting the time for the index rebuild each times will be less than 8 hours result of less index fragmentation because of frequent rebuild? OR

    Will it still be roughly the same amount of time (around 8 hours) every times index rebuild is fire (that means no dependency of severity of index fragmentation)?

    Still will be 8 hours every time it is fired. Under the hood what happens is new data pages are created and the data placed into them, then they are made active and the older pages marked inactive.

    8 hours seems like a long time but it depends on many factors such as Hardware IO, total size of data, does data and/or log file have to grow during process and how often, many other things.

    I would suggest first look at the things you have indexed and consider if you might have too many indexes.

    Do you have columns in multiple indexes as part of a composite index and a lot of wide indexes. Remember the width of an index item is the total number of bytes in for the column involved plus those of the clustered index at least (if no clustered index use 16 bytes as the page record identifier value, altought I can''t remember the eact value). You may find you have multiple indexes that need to be broken down for performance anyway.

    Do you have composite indexes which include things from you Primary Key. Ex. I have a column named My_ID which is the primary key, an identity column and is the clustered index. I also have a column named Start_Date and from that an index was created as a composite of Start_Date and My_ID. The issue is I do not need the My_ID column because as the clustered index My_ID is already matched into any index on Start_Date by the indexing schema.

    The key is if you find issues in your indexes you might be able to improve the indexing schema to reduce the overall index storage needed which in turn reduces time to reindex.

    As well, if you have a lot of indexes you might find you really need to normalize a bit more to improve things overall but be carefull not to normalize if you cause yourself performance issues.

    Look at your indexes and check the amount of uniqueness and verify the index is being used to perform an Index Seek and not a Scan. If you have a column with few distinct values, if it always does and index scan it might be just as fast to scan the table and not bother with the index.

    Another thing is consider the data, if you have a historical table which has several years in it you might consider moving prior years to another table espcially if the data is not updated. The create a view to pull the two together, then you only need to update the indexes when you move future years into the historical table and because the newer data is maintained in a smaller table the rebuilds will take much less time.

    Also don''t reindex any table that has no changes to it such as lookup tables that are static for long periods. This is just a waste of time as the index is current all the time.

    Avoid using Drop and Create btw unless you have damaged the index and look at the previous statement as to proper method.

  • Hi,

    Thanks for the detailed insight.

    First, I wish the WITH STAT for index rebuild will be there in next release of SQL Server, SQL2008/9? I guess SQL2005 still inherit the something from SQL2000.

    We are using DBCC DBREINDEX for index build. Understand that the fill factor of huge and active tables previously set to other than 0 and 100 will take from the sysindxes table. However, PAD_INDEX is not an option with DBCC DBREINDEX (another wish list for Redmond guys), so all the index intermediate pages will be almost all filled while leaf pages will be filled to the % of fill factor.

    1)What are the PROs and CONs of having index intermediate pages almost all filled while leaf pages will be set to the % of fill factor?

    2)For performance sake, it is ALWAYS ideal to have PAD_INDEX ON which makes the index intermediate pages and leaf pages fill to the same %?

    3)I have a big table with 133 columns. 1 cluster index and 3 non cluster indexes but 70 auto-created column (_WA_Sys_) stats. Is it normal?

    If this is a sign of under indexed, some of these column stats should be converted to non cluster index?

    On the other hand, with more indexes (converted from stats), INS/UPD statements will be slower and same apply to Index rebuild take longer, right?

    Currently, the index rebuild for this big table is taking 4 hours (half of the total time for full database index rebuild)

    I am trying to balance the cost and benefit here. Really need some expert advice. Thanks.

  • 1) What are the PROs and CONs of having index intermediate pages almost all filled while leaf pages will be set to the % of fill factor?

    Intermediate pages support high and low values for the leaf pages to point the query engine to the appropriate pages. I would always use a much larger fullness for the intermediate than the leafs, keep in mind a page is 8k in size so if you had an interger column for the unique clustered index and the data for the indexed column is char(10) then you will have a total of 14 bytes per index item so about 580 records in a single page, consider how many records a day change (for that column, if the value is updated to the same it didn't change) and determine how many pages will take for a new page to be created randomly (if not a progressive situation [a b c d e ...]) then considering the high lowes are based on the char(10) plus the point to the page (I think it is about 16) consider now it will take about 226 pages to fill the intermediate thus depending on your daily data changes a value of 90-100% should suffice them where you may have 50-100% on the leafs. I would rarely consider personally going below 70% even for the leaf pages just a matter of opinion.

    2) For performance sake, it is ALWAYS ideal to have PAD_INDEX ON which makes the index intermediate pages and

    leaf pages fill to the same %?

    Not always you have to understand how the index is managed. The point of padding an index is to provide gaps for new values so as to limit the number of page splits that occurr becaue of data changes. However by adding padding you can hurt performance due to the fact you have to read more data pages the bigger the padding. Also data inserts split the page when it reaches 100% full based on the position of the insert. If it comes after the last record a page is allocated for the single record, if it occurrs in the middle the a page is allocated an that record plus data after it is moved to the new page progressing up thru the leafs to the root. So when considering padding think of your data, if for example it is a prgressive value like an Identity column where every insert is the next higher value you should fill to 100% as earlier data pages will not be affected. Even if it were a column where you might occasionally go back and alter a few values within the existing ranges padding to 100% will suffice (even if the value was decrementing instead) to keep your pages nice and tight for faster reading. If the values where say a uniqueidentifier column then 50% padding would be a good starting point as these are all over the place as seemingly random (they are not thou just in case someone says otherwise). So you padding is something you should consider based on what will happen with the data and not a standard formula.

    3) I have a big table with 133 columns. 1 cluster index and 3 non cluster indexes but 70 auto-created column (_WA_Sys_) stats. Is it normal?

    Stats are normal as they help the query engine determine the path to quickest lookup. They are generally recalculated if you have auto update stats turned on for the database but this is based on a percentage of change threshold so the larger the table the longer between updates. Updating stats on the table periodically may improve your queries without the need to reindex, but that is something you would have to test.

    I would say that 133 columns are a lot for a single table and normalizing may bennefit you on speed, performance and scalability more than it will impact you depending on what your table represents.

    Ex.

    I had a table storing information about Financial Centers. The original design had all the key personnel roles in columns within the table. 30% or more of the values where NULL in any given column. One column was even 80% null. Considering I had 9000 records and there where 14 columns each varchar(10) (filled 5-7 characters currently plus 2 bytes each for SQLs varchar management overhead under the hood BTW). I also had 3 columns indexed (1 was the 80% null one) due to search requirements.

    I decided after the third time of needing to add a new position (ended at 14 column when I changed) that the code changes and such were extremely silly and started researching normalization at that point to guide me.

    Afterwards the redesign included a table mapping Finacial Center to Role To Person which I clustered on Role + Person this not only reduced some of my footprint by removing unneccessary nulls but improved performance due to the index and scalability by the fact I just manage a table to add and remove roles.

    After a lot of doing this I removed over 100 columns and actually saw a drastic improvement in data quality, scalability and performance. So consider your design and see if you really have it done right. Of course you will here normalization will hurt performance and there can be some downsides depending on your design and your applications interactions but I have yet to run into anything that degraded my performance to a big enoguh extent based on the requirements of my system. And nothing usually takes more than 3-5 seconds to run.

    If this is a sign of under indexed, some of these column stats should be converted to non cluster index?

    Not neccessarily, I don't have anyway to tell you without knowing a lot more about your usage and needs.

    On the other hand, with more indexes (converted from stats), INS/UPD statements will be slower and same apply to Index rebuild take longer, right?

    Not always, yes it will slow you inserts down but updates will depending on if the data related to the index actually changes. As well the slow down will not for most be in the order of magnitudes and wheighing a negligable slow for input speed against greater gains gather results might overall be a performance increase for the system. In fact a much overlooked thing is that longer queries can lock tables until completion and prevent inserts and updates from occurring until the lock is in a state they will be allowed. Conversely also you have to consider how many updates you perform in a single batch because more indexes can increase locks until complete preventing queries. But the answer is you would have to research it for yourself to know for sure.

  • Hi,

    I did a index rebuild (DBCC DBREINDEX) last week. The last index rebuild was done a month ago. I realised that the time taken to halfed for a particular index. It only take 2 hours (last week) as compare to 4 hours (last month) for the same index rebuild. Before doing the rebuild last week, i did a DBCC showcontig and logical fragmentation is zero and scan density is 100%. I compare the showcontig result just before the index rebuild done last month, logical fragmentation was more than 35% and scan density is 35% and scan density < 75%. index rebuild time taken 4 hours.

    (note: index rebuild before this was 4 motnhs ago)

    last week - logical fragmentation = 0 and scan density is 100%. index rebuild time taken 2 hours.

    Can it conclude that the severitry of logical fragmentation does play in term of time required for index rebuild using DBCC DBREINDEX? It was contrast to the reply posted earlier.

  • What is your indexes fill factor? It shouldn't be on an order of a magnitude as the reindex does copy the index into new pages and drop the old pages once done. But I haven't ever set down to test scenario's of DBCC DBREINDEX to see what other under the hood advantages it uses over DROP and CREATE index, so there could be something I haven't thought of. I ask about fill factor as it may boild down to how many pages it must read from and the more pages produced by fragmentation the longer it may take to read to rebuild, but I find experience is always the best way to know for sure and yours tells me there may be something to think about with time between reindexes.

    Also, have you ever considered using DBCC INDEXDEFRAG on the db to see if it offers any advantages in your case.

  • The fillfactor remain the same for each DBCC DBREINDEX run. We didn’t explicitly specify the fill factor in DBCC DBREINDEX command as it will get the same value every run from the sysindexes.

    DBCC DBREINDEX is better than DBCC INDEXDEFRAG in term of high index fragmentation (>20%) and thus we always stick to it whenever a down-time permits. However, 8 hours of down-time for index rebuild is too long to bear for an online web application. That’s why we are looking at short it by up the frequency of index rebuild from 4 months to every month.

    However, that’s a difference in the last week DBCC DBREINDEX run and the one done in last month which I forgot to mention in my last port. The only setting difference is change of DB Recovery Mode as showed:

    DBREINDEX------DB Recovery Mode----Time Taken---TransLogSize

    Last Week------bulk-logged-------2 hours------1/2 of X MB

    Last Month-----full----------------4 hours------X MB

    To me, the change of DB Recovery Mode (full to bulk-logged) impact in term of growth of translog size during index rebuilds make sense.

    I am not sure whether the change of DB Recovery Mode (full to bulk-logged) also impacts significant reduce of time taken for index rebuild? I need to confirm that.

  • Interesting information. If you test it out please follow-up here so others can see your results.

  • Garrick-  how big is this database that's taking 8 hours to reindex?

    I just kicked off a dbreindex on a table that is about 53GB in size and was at 92% Extent Scan Fragmentation, and it has been running for over 2.5 hours so far.  I realized after I did this that I probably should not have done this during the day, and now I'm starting to stress out a little bit.  I didn't think this would affect the overall performance of the SQL server, but now I'm getting users complaining of ODBC errors popping up from apps that are using other databases on the server (not the one with the table I'm reindexing), and I can't do anything but just say "sorry" and wait for this thing to finish.

    (obviously I'm not a "sql guy", I was just thrust into this position and I'm learning by doing....... not how I usually like to do things)

  • The db i mentioned is 90GB. The biggest table is 60GB and clustered index rebuild on this table alone take 4+ hours.

    dbcc dbreindex in SQL2000 need a DB to be offline (no users). However, dbcc indexdefrag can be done online. But, it's safe to be carry out off peak hours.

  • i have the same problem. but i'm concerned about something else. I have an table with 2 indexes, one clustered and one non-clustered. I have a lot of inserts - about 2 millions a day. Also, I have some jobs which runs in the night - they work on reindex (fill factor 0%) all the tables and update statistics on all tables. I do not know why "Mem usage" and "VM size" in Task Manager are increasing until SQL Server reach a limit when it says "Insufficient memory".

    In Theory, theory and practice are the same...In practice, they are not.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply