June 2, 2011 at 12:40 pm
Greetings everyone.
I am having a conundrum involving indexes and hoping I can understand the seemingly arcane situation here ๐
I have tables that I load from an AS400 into a staging database each night. From this Staging database the regular data warehouse is built.
Here is the issue, after performance tuning a particularly troublesome load query, It will normally execute in about 20 seconds (it was previously taking about 10). However, each night the tables in the staging database are truncated and reloaded (there are 2 I am using for this query, each has 1 clustered and 1 non clustered index). When I attempt to run the query after the nightly load it takes roughly 5 minutes to execute. Even with recompiling and updating the statistics the query remains the same. It is only after I issue an alter index all on table rebuild command on both the tables that the query goes back to 20 seconds.
I have tried a numerous amount of ideas and none seem to prevail.
What are the best practices in this situation? Rebuilding the indexes doesn't take very long, so I could easily just add this to the nightly process but to me it seems like there should be an easier method to achieve this. What all is happening in an index rebuild that is not happening when I am loading the table?
Thank you in advance!
Link to my blog http://notyelf.com/
June 2, 2011 at 12:57 pm
Have you tried doing this: truncate the table, drop both the clustered index and non-clustered index, load the table then recreate the clustered index first and then the non-clustered index?
What is happening is that as your table is being loaded, you're experiencing page splits since the data is being loaded in a different order than the order specified by your clustered index. This causes the table to be highly fragmented at the conclusion of the load.
Adding data to the table when it is a heap will just load it in whatever order the data is added. Placing the clustered index on the table afterwards will sort it back out into the proper order saving an index rebuild.
June 2, 2011 at 3:26 pm
with respect you're assuming the load produces fragmented indexes - I have a different situation but similar process - although the index was not fragmented it needed to be rebuilt to work correctly - yeah very strange.
you should verify by checking fragmentation just to be sure. drop and recreate vs rebuild isn't too much different to be honest.
A trick I've used is to rebuild the indexes after truncating the table - again strange but interesting. I may blog about this some time - but slightly tricky to produce test data as examples ( e.g. can't reproduce out of production )
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
June 6, 2011 at 1:09 pm
Thank you for the information. That does make sense and I have switched to truncating the table, dropping the index, loading the table, and building the indexes. Performance in general seems to be improved ๐
One thing I do notice though, is that it still takes a run of a query for it to (warm up). For instance if I run a query once it takes 7 seconds the first time (a simple select count(*) from table query) then less than a second each subsequent run after that. Is this because it is now in the memory?
Link to my blog http://notyelf.com/
June 6, 2011 at 1:27 pm
I suspect that what happens with a Select Count(*) is that it causes the tableโs rowcount metadata to be updated by the SQL engine. The second time you run the select count (*) this information is already up-to-date and so it runs faster.
June 6, 2011 at 1:54 pm
shannonjk (6/6/2011)
One thing I do notice though, is that it still takes a run of a query for it to (warm up). For instance if I run a query once it takes 7 seconds the first time (a simple select count(*) from table query) then less than a second each subsequent run after that. Is this because it is now in the memory?
This is either due to query compliation (unlikely unless you have a where clause) or it's pulling the table into cache on the first run and then using it from cache on the second so it doesn't have to read from disk again.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply