Add data to Clustered Table Or Building Clustered Index after inserting data - Which is better ?

Question

Post reply

Add data to Clustered Table Or Building Clustered Index after inserting data - Which is better ?

er.mayankshukla

SSCarpal Tunnel

Points: 4190
More actions
October 14, 2016 at 3:24 am

#330848

Hi,
Which of the scenario is better, consider the below scenario.
An empty table without any index get an insert of 1million records and then the clustered index is created on that.
A clustered index is created on an empty table and then the clustered index is created on it.
Which one is better ?
I was thinking that a table with clustered index on it prior to filling data, may cause auto update statistics events during the load and may cause some delay.
However, on testing this I see lastupdated column of statistics as NULL from below query, when I insert 1 million rows in a table with clustered index
SELECT name AS index_name,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID('Price_Current.Planningmeasures_exp')
GO
Quite confused

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

BrainDonor SSCoach Points: 19254 More actions · Answer 1

I assume you meant which is better - adding the clustered index before inserting data, or after inserting data?

The answer is of course 'it depends'.

At a previous job we dealt with multi-terabyte databases and tables with a billion rows were not unheard of. What decided how we inserted the data depended upon the number of rows being inserted and if that table would be required while we were inserting into it.

Generally we would bulk insert, making sure the data was in the correct order for the existing clustered index and leave the clustered index in place. This was perfectly acceptable for several million rows at a time.

If we needed to execute a very large insert (because there was an issue with the existing data and we had to remove a large amount beforehand) then we would copy the table, drop the clustered index on the copy, insert the new data into the copy, create the clustered index and then swap the tables over.

Testing with sufficiently large volumes is probably the best way to decide which approach is better.

For your statistics updated date - run a query that filters on the clustered index and then check for your statistics update. Of course you could run UPDATE STATISTICS first, but the first query that requires statistics would generate the required statistics.

Steve Hall
Linkedin
Blog Site

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

BrainDonor (10/14/2016)
For your statistics updated date - run a query that filters on the clustered index and then check for your statistics update. Of course you could run UPDATE STATISTICS first, but the first query that requires statistics would generate the required statistics.

Yes, but the auto-update will use the default sample rate. Manual UPDATE STATISTICS you can set the sample rate.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

er.mayankshukla SSCarpal Tunnel Points: 4190 More actions · Answer 3

I am bit confused here,

For a regular table without any index ( a Heap), the statistics get created on firing any query with some filter.

However, on the table with clustered index, the statistics is created at the time of creating index.

Now, after the statistics is created on the index, whenever I insert rows to the table, shouldn't the statistics be auto-updated when the number of row inserts reaches the threshold of stats update.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

er.mayankshukla (10/14/2016)
Now, after the statistics is created on the index, whenever I insert rows to the table, shouldn't the statistics be auto-updated when the number of row inserts reaches the threshold of stats update.

No.

The number of updates/inserts will invalidate the stats, but they'll only be updated the next time something uses them. This prevents multiple unnecessary stats updates due to repeated inserts of data

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

er.mayankshukla SSCarpal Tunnel Points: 4190 More actions · Answer 5

er.mayankshukla

SSCarpal Tunnel

Points: 4190

October 14, 2016 at 5:28 am

#1906540

Thanks for clarifying Gail and Steve 🙂