October 14, 2016 at 3:24 am
Hi,
Which of the scenario is better, consider the below scenario.
An empty table without any index get an insert of 1million records and then the clustered index is created on that.
A clustered index is created on an empty table and then the clustered index is created on it.
Which one is better ?
I was thinking that a table with clustered index on it prior to filling data, may cause auto update statistics events during the load and may cause some delay.
However, on testing this I see lastupdated column of statistics as NULL from below query, when I insert 1 million rows in a table with clustered index
SELECT name AS index_name,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID('Price_Current.Planningmeasures_exp')
GO
Quite confused
October 14, 2016 at 4:51 am
I assume you meant which is better - adding the clustered index before inserting data, or after inserting data?
The answer is of course 'it depends'.
At a previous job we dealt with multi-terabyte databases and tables with a billion rows were not unheard of. What decided how we inserted the data depended upon the number of rows being inserted and if that table would be required while we were inserting into it.
Generally we would bulk insert, making sure the data was in the correct order for the existing clustered index and leave the clustered index in place. This was perfectly acceptable for several million rows at a time.
If we needed to execute a very large insert (because there was an issue with the existing data and we had to remove a large amount beforehand) then we would copy the table, drop the clustered index on the copy, insert the new data into the copy, create the clustered index and then swap the tables over.
Testing with sufficiently large volumes is probably the best way to decide which approach is better.
For your statistics updated date - run a query that filters on the clustered index and then check for your statistics update. Of course you could run UPDATE STATISTICS first, but the first query that requires statistics would generate the required statistics.
October 14, 2016 at 5:03 am
BrainDonor (10/14/2016)
For your statistics updated date - run a query that filters on the clustered index and then check for your statistics update. Of course you could run UPDATE STATISTICS first, but the first query that requires statistics would generate the required statistics.
Yes, but the auto-update will use the default sample rate. Manual UPDATE STATISTICS you can set the sample rate.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 14, 2016 at 5:08 am
I am bit confused here,
For a regular table without any index ( a Heap), the statistics get created on firing any query with some filter.
However, on the table with clustered index, the statistics is created at the time of creating index.
Now, after the statistics is created on the index, whenever I insert rows to the table, shouldn't the statistics be auto-updated when the number of row inserts reaches the threshold of stats update.
October 14, 2016 at 5:26 am
er.mayankshukla (10/14/2016)
Now, after the statistics is created on the index, whenever I insert rows to the table, shouldn't the statistics be auto-updated when the number of row inserts reaches the threshold of stats update.
No.
The number of updates/inserts will invalidate the stats, but they'll only be updated the next time something uses them. This prevents multiple unnecessary stats updates due to repeated inserts of data
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 14, 2016 at 5:28 am
Thanks for clarifying Gail and Steve 🙂
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply