April 23, 2010 at 7:50 am
We are looking to fine-tune our ETL process for our Data Warehouse and some of the articles we have found deal with index reorg and improvements this can make. (http://technet.microsoft.com/en-us/library/ms189858.aspx) They have stated that if the fragmentation is > 30% rebuild the index but if between 5-30% reorg the index.
With this in mind we were thinking of checking the fragmentation and then have a decision point to determine whether we would reorg or rebuild. We build a quick little test to prove that this would work below:
--CREATE TABLE
CREATE TABLE DECIPHER_DATA (COL1 INT, COL2 NVARCHAR(500));
--INSERT 500 ROWS
SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 1
WHILE (@I <= 500)
BEGIN
INSERT INTO dbo.DECIPHER_DATA (COL1, COL2) VALUES (@I, REPLICATE('A', @I))
SET @I = @I + 1
END
--CREATE INDEX
CREATE UNIQUE INDEX PK_DECIPHER_DATA ON DECIPHER_DATA (COL1);
UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;
--DELETE 250 ROWS
DELETE FROM DECIPHER_DATA WHERE COL1 <= 250;
UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;
--Find Fragmentation
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'DECIPHER_DATA'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
--ReOrg
ALTER INDEX [PK_DECIPHER_DATA] ON [dbo].[DECIPHER_DATA] REORGANIZE WITH ( LOB_COMPACTION = ON )
--Find Fragmentation
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'DECIPHER_DATA'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
--INSERT 250 again
SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 1
WHILE (@I <= 250)
BEGIN
INSERT INTO dbo.DECIPHER_DATA (COL1, COL2) VALUES (@I, REPLICATE('A', @I))
SET @I = @I + 1
END
--UPDATE Statistics
UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;
--Rerun fragmentation
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'DECIPHER_DATA'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
--ReOrg
ALTER INDEX [PK_DECIPHER_DATA] ON [dbo].[DECIPHER_DATA] REORGANIZE WITH ( LOB_COMPACTION = ON )
--Rerun Fragmentation
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'DECIPHER_DATA'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
However, in running this test which builds a table, inserts 500 rows, deletes 250 rows, and then reinserts 250 rows again we have found that neither reorg or rebuild actually removes the fragmentation.
Is there a better way to determine fragmentation? Or are we going down the wrong path altogether? Any thoughts would be appreciated. Thanks.
April 23, 2010 at 10:04 am
Hi,
if I were you, rather than spend time reinventing the wheel generating and testing a script to do this, I would just use the following excellent script from Michelle Ufford.
April 23, 2010 at 10:30 am
Thanks, I'll take a look at it. In addition can you help me understand how fragmentation is calculated? I would assume when a reorg or rebuild happens that the fragmentation would be decreased or removed altogether, but in the very simple script we ran, the fragmentation stayed same following the reorg.
April 23, 2010 at 12:28 pm
aber (4/23/2010)
Thanks, I'll take a look at it. In addition can you help me understand how fragmentation is calculated? I would assume when a reorg or rebuild happens that the fragmentation would be decreased or removed altogether, but in the very simple script we ran, the fragmentation stayed same following the reorg.
This is due to the size of the table and the way SQL Server stores small tables. Small tables are stored in shared extents - and will not ever be fully defragmented.
The general rule is to not worry about tables with less than 1000 pages - which you'll see when you review Michelle's utility.
With that said, I am not sure how doing this is going to improve (fine-tune) your ETL process. It is not a bad idea to rebuild/reorganize your indexes - but in an ETL process it won't help. What really helps the ETL process is making sure you insert the data in the same order as the clustered index, and, depending on how much data is being inserted/updated - disabling the non-clustered indexes.
For example, if you are reloading the table - disable all non-clustered indexes, insert in the same order as your clustered index and then rebuild all indexes (not reorganize, you can't when you disable them).
On the other hand, if you are performing incremental updates - and the percentage of rows is very small compared to the number of rows in the table, disabling the non-clustered indexes and rebuilding might take longer than leaving them alone. Testing in your environment will tell you which is best.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 23, 2010 at 12:36 pm
This was the exact information that I was looking for, thanks! We have played around a bit with disabling and then rebuilding and did find that some of these larger tables were taking longer to rebuild than it was worth.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply