April 30, 2008 at 7:55 am
I do a daily import of RAW data. Normally we don't put index's on out RAW tables, since we truncate them before each import, but this customer has several tables ranging from 60k-5.6mil rows. I have clustered index's on the columns that I have to consistantly filter out a static value. Seems the first time I run each query, there is about a 6 second pause. I figure it's updating the query plan because of the truncation. But I think this is defeating the 0-10second query if ~5-6seconds is wasted on figuring out what to do.
We also have a phase of loading data from the live database, this would be a good time to update any plans since the raw data won't be accessed for a good 2-3 minutes. Will these plans be updated in parallel to the live data load since this would be all run in a script?
Is there a way to figure this stuff out ahead of time or to keep the previous plan or any other ideas that may help.
Thanks
April 30, 2008 at 8:39 am
You have pluralized "clustered index" but you can only have one per table, so I am not sure if you have you indexing correct, but I will assume it is ok.
The initial pause may be because your statistics are out of date because of a significant change in the number of records each time you load. You may want to add a statistics update to your load process.
If your queries are just queries, the plans may get out of the cache (if they cache at all) at some point during the day. If you change to stored procedures your plans would persist until the server is rebooted or you flushed the cache. This may be better for you. Of course, significant changes in the data could mean you want to update the plans after each load so it may hurt more than it helps.
April 30, 2008 at 8:57 am
Michael Earl (4/30/2008)
...The initial pause may be because your statistics are out of date because of a significant change in the number of records each time you load. You may want to add a statistics update to your load process.
...
You can pretty much guaranteed your statistics are garbage when truncating and reloading.
Is your source data sorted according to the clustered index? If not you may be better off using the sort in the source as your clustered index and add a non-clustered at the end of your load. I would drop and recreate the index as part of the load process if using a non-clustered.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 30, 2008 at 8:59 am
the plural 'clustered index' is in reference to the many tables, so many index's 🙂
and we do use stored procs
I agree with that it may hurt more(on smaller tables) because of truncate causing the plans to be recomputed, but there is an import of live data that will take 2-3 minutes. This live data is imported to different tables than the raw data, so during this time I could have the server recompiling the plans. How might I do this, assuming it's the right way to go.
We have raw tables, intermediate tables and live tables. We first import raw data to the raw tables, then insert live data to the intermediate tables, then run queries against the raw data to clean/format to the intermediate tables. During the loading the of the live data, The server should be able to recompute the plans, but how would I go about forcing this parallel to the loading?
April 30, 2008 at 9:05 am
You can pretty much guaranteed your statistics are garbage when truncating and reloading.
Is your source data sorted according to the clustered index? If not you may be better off using the sort in the source as your clustered index and add a non-clustered at the end of your load. I would drop and recreate the index as part of the load process if using a non-clustered.
This sounds about right. Can you point me to something or give me some example code on how to do this?
And how would I make sure that all this is going on parallel to the live data load? I have a batch file that calls a DTS package. Then calls an SQL proc that starts the whole load live->load raw->verify->load to live process. But from what I know, calling a stored proc will do all commands in serial.
April 30, 2008 at 10:40 am
I found "UPDATE STATISTICS ". Is there any way to script the command run on my raw tables parallel to the live table import or would I have to open a new connection?
April 30, 2008 at 9:13 pm
Updating statistics can't be done in parallel... gotta be done after all the data is loaded.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2008 at 7:29 am
I wonder if the initial 'pause' in query response isn't simply because the data is having to be lifted up from disk to RAM. Have you checked for I/O waits while this query is fired the first time?
Do this: from SSMS, have SET STATISTICS TIME ON (Displays the number of milliseconds required to parse, compile, and execute each Transact-SQL statement after statements execute.) and SET STATISTICS IO ON (Displays information about the amount of disk activity that is generated by Transact-SQL statements after the statements execute.) and then execute the first query. You will get the information necessary to find out what is causing your 'slowness'.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply