May 31, 2017 at 4:02 am
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
May 31, 2017 at 8:35 am
paul.farnell - Wednesday, May 31, 2017 4:02 AMI have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 31, 2017 at 8:35 am
paul.farnell - Wednesday, May 31, 2017 4:02 AMI have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
How big is the table, in pages?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 31, 2017 at 8:38 am
Sean Lange - Wednesday, May 31, 2017 8:35 AMpaul.farnell - Wednesday, May 31, 2017 4:02 AMI have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...
SQL Server 2016, 38000 rows. Select is from a different database from 2 tables using an inner join. 1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table. This primary key is the only index on the table.
May 31, 2017 at 8:40 am
GilaMonster - Wednesday, May 31, 2017 8:35 AMpaul.farnell - Wednesday, May 31, 2017 4:02 AMI have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?How big is the table, in pages?
2905
May 31, 2017 at 8:41 am
paul.farnell - Wednesday, May 31, 2017 8:38 AMSean Lange - Wednesday, May 31, 2017 8:35 AMpaul.farnell - Wednesday, May 31, 2017 4:02 AMI have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...
SQL Server 2016, 38000 rows. Select is from a different database from 2 tables using an inner join. 1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table. This primary key is the only index on the table.
Row counts is helpful. How wide is this table? Again....seeing the destination table definition would help.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 31, 2017 at 8:49 am
May 31, 2017 at 1:09 pm
what is the fill factor of the index?
are there any gaps in the identity data?
are there any triggers or procedures that might be running and modifying data at the same time?
June 1, 2017 at 7:03 am
Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.
June 1, 2017 at 8:09 am
Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.
Completely agree. Drop the PK, load the table, add the PK back. You might even consider dropping and recreating the entire table minus the PK, load the table, add the PK.
April 14, 2018 at 8:25 pm
I know this is a 10 month old post but (and especially since SQL Server 2008), as with all else in SQL Server, I'll have to say "It Depends"...
If you're required to use the FULL Recovery Model for such a "Replace Everything" load, then there's only a 17% performance advantage while there's a 200% disadvantage for disk/memory space usage and a 200% disadvantage for how much log file is used if you load the HEAP and then build the Clustered PK as compared to populating an empty table with the PK in place. On a million row load of a table with an IDENTITY property column being preserved as an IDENTITY column with rows that have a row size of 1,037 bytes each, there's only a difference of 24 seconds. Here's the run stats on the tests I've done.--===== Final size of the databases and test durations
-- Full Recovery, IDENTITY Property Preserved
LogicalName SizeMB RunDuration RecoveryModel
------------------------------ ----------- ------------ -------------
PKPopulationTest_WithoutPK 2300 00:02:18:483 FULL
PKPopulationTest_WithoutPK_log 2200
PKPopulationTest_WithPK 1200 00:02:42:137 FULL
PKPopulationTest_WithPK_log 1100
Note that both test databases started out with both a 100MB MDF and 100MB LDF with a growth for each of 100MB.
Things change quite a bit if there's no requirement to preserve the IDENTITY property of the Integer based Clustered PK. Populating the empty table WITH the Clustered PK in place blows the doors off populating a HEAP and then adding the Clustered PK in all aspects.--===== Final size of the databases and test durations
-- Full Recovery, IDENTITY Property Preserved
LogicalName SizeMB RunDuration RecoveryModel
------------------------------ ----------- ------------ -------------
PKPopulationTest_WithoutPK 2300 00:01:45:747 FULL
PKPopulationTest_WithoutPK_log 2100
PKPopulationTest_WithPK 1200 00:00:58:477 FULL
PKPopulationTest_WithPK_log 1100
Changing the first test (IDENTITY Property preserved) to the Bulk_Logged recovery produces faster but similar results in that the in-place Clustered PK method is slower than the HEAP load and build the Clustered CI method. Again, the IDENTITY Property on the Clustered PK was preserved.--===== Final size of the databases and test durations
-- Bulk_Logged Recovery, IDENTITY Property Preserved
LogicalName SizeMB RunDuration RecoveryModel
------------------------------ ----------- ------------ -------------
PKPopulationTest_WithoutPK 2300 00:01:01:340 BULK_LOGGED
PKPopulationTest_WithoutPK_log 100
PKPopulationTest_WithPK 1200 00:01:27:590 BULK_LOGGED
PKPopulationTest_WithPK_log 100
Since this is a full replacement of all the data in a table and it's for a data warehouse, preserving the IDENTITY Property may not be necessary. And, since it's a data warehouse, use of the Bulk_Logged Recovery Model (at least during the loads like this and with the understand of what any bulk logged operation can have on point-in-time restores), If you can settle for that, then here's the type of performance you can get especially from the in-place Clustered PK method... comparatively NASTY FAST and low resource usage for the in-place PK method.--===== Final size of the databases and test durations
-- Bulk_Logged Recovery, IDENTITY Property NOT Preserved
LogicalName SizeMB RunDuration RecoveryModel
------------------------------ ----------- ------------ -------------
PKPopulationTest_WithoutPK 2300 00:00:56:997 BULK_LOGGED
PKPopulationTest_WithoutPK_log 100
PKPopulationTest_WithPK 1200 00:00:37:507 BULK_LOGGED
PKPopulationTest_WithPK_log 100
As for the original question of where there was 96% Fragmentation after the load... it's generally not possible to have that much fragmentation on a "Full Replace" load like this. All the code above used parallelism during the copy and the fragmentation was very near zero, so it's not parallelism that would have caused it. This is especially true if the first key column is "ever increasing" but a well placed ORDER BY like that used in the original post will guarantee the correct order.
About the only thing that I can think of that would cause the fragmentation would be an "after" trigger that cause an "expAnsive" update that cause rows to grow.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply