April 22, 2009 at 10:32 am
Hi all. I'm creating a simple SSIS package to pull some data from a view on the source server and write it to a table on the destination server. We want to completely replace the data in the destination table every time the package runs. So my question is: should I drop & recreate the destination table or just truncate it?
I found this topic when I searched, but it seems to be fairly specific to the OP's situation, and the interesting part boils down to this:
It's better to truncate and have the index already defined when you insert data.
It's cleaner to drop and recreate the tables and indexes before you bcp in data but only marginally so.
Which doesn't really help me decide. Why is it "better" to truncate when it's "cleaner" to drop and recreate? What I'm really looking for is thoughts on best practices, or some set of general criteria that I can apply to help me decide in any given situation.
The specifics of this situation:
- Source server is SQL 2000; the source object is a view that joins several tables and returns about 8000 rows (that number will grow, but slowly and at a predictable rate).
- Destination server is SQL 2005; the destination table was created by the initial run of the data transfer wizard, so there are no indexes or other constraints to speak of (at least, not yet - I was kind of asked to help after the fact on this one).
- The package will live and run on the destination server.
Thanks!
April 22, 2009 at 10:40 am
These are the steps I use for this type of process:
1) Disable all NC indexes on destination table
a) ALTER INDEX IndexName ON table DISABLE;
2) Truncate destination table
a) Minimally logged operation - don't need to recover if we are reloading from source
3) Extract and Load data into table
4) Rebuild all indexes
a) ALTER INDEX ALL ON table REBUILD;
It is generally much faster for the extract/load process if there are no indexes on the table. Using the above process, the only index I have is the clustered index and all indexes will be rebuilt after the table has been loaded, so the load process is fairly quick. It can be even quicker if the load is performed in the same order as the clustered index.
You can't disable the clustered index - or the table is no longer accessible.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 22, 2009 at 11:10 am
That's pretty much the same thing I do. Disable indexes, truncate, load, enable indexes. It's very efficient.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 22, 2009 at 12:21 pm
Jeffrey & GSquared, thanks for the procedures and advice. Is it fair to say that the efficiency is primarily due to disabling the indexes, and that it wouldn't make much difference if I drop/create the destination table rather than truncate it? To be clear, truncating the table seems like the natural choice, but I'm trying to understand why or when you might want to drop and recreate instead. I'm looking at various tutorials and the drop/create method seems fairly common - do you think that's just because it's easier to do with the wizard? i.e. it's an easier example/demo but not necessarily how you'd do it in production?
Also, Jeffrey, when you say
...the only index I have is the clustered index and all indexes will be rebuilt after the table has been loaded, so the load process is fairly quick. It can be even quicker if the load is performed in the same order as the clustered index.
does that mean I should sort the source result set? e.g. the destination table has a clustered index on OrderID, so I should sort the source records on OrderID before they're loaded?
April 22, 2009 at 12:33 pm
If you drop and recreate the destination table every time, then you also have to recreate the indexes and re-apply all security to the table. Then, what happens when someone else creates a new index that your process does not know about?
Truncating the table keeps all indexes and permissions and only requires a few seconds to clear the data out, which is also minimally logged.
does that mean I should sort the source result set? e.g. the destination table has a clustered index on OrderID, so I should sort the source records on OrderID before they're loaded?
Sorting the results set prior to loading the data could help reduce the load time and the clustered index rebuild time. I do not bother with this step myself, since the process I use is well within the acceptable performance requirements.
It's really a step that you can take to improve performance if needed - and so far, I have not needed it.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 22, 2009 at 12:58 pm
Jeffrey Williams (4/22/2009)
If you drop and recreate the destination table every time, then you also have to recreate the indexes and re-apply all security to the table. Then, what happens when someone else creates a new index that your process does not know about?
Cool, this makes a lot of sense and actually seems a bit obvious in hindsight. :blush: But that's the great thing about this site - I get to capitalize on the experience of others! Thanks again for your help.
--MIJ
April 22, 2009 at 1:11 pm
Yes - that is the idea. Glad I could help out.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 26, 2009 at 6:36 am
[/quote]
Which doesn't really help me decide. Why is it "better" to truncate when it's "cleaner" to drop and recreate? What I'm really looking for is thoughts on best practices, or some set of general criteria that I can apply to help me decide in any given situation.
The specifics of this situation:
- Source server is SQL 2000; the source object is a view that joins several tables and returns about 8000 rows (that number will grow, but slowly and at a predictable rate).
- Destination server is SQL 2005; the destination table was created by the initial run of the data transfer wizard, so there are no indexes or other constraints to speak of (at least, not yet - I was kind of asked to help after the fact on this one).
- The package will live and run on the destination server.
Thanks![/quote]
I'll share my experiences on this one too although I'll touch more on the actual extract part.
I see that you're using a view on the SQL2K box to get at the source data. I used to use a similar method when I was loading history into my company's new Datawarehouse box. Source is on 2K and the new DWH is running 2K5.
What happened has the size of the data set returned from the source grew the extract was taking longer and longer until the SSIS package started timing out has the source was not responding in time. The solution was to extract the individual tables unjoined, so flat in effect, to a staging area then perform the actual joins on the destination.
Now you may not want to stage the data but this may be something to consider for the future. The benefits are that you release the source servers resources sooner, minimize time on the network and make better use of the destination servers resources.
April 26, 2009 at 10:09 am
Yes I would agree the performance gains in taking the flat tables behind the view and then doing all the work on the destination server is significant especially when not only the row count increases but if there is a complex query attached to the view.
April 27, 2009 at 9:45 am
Andrew & Hassie, thanks for your comments. Performance isn't a problem for us right now, but it's nice to know what our options might be when the time comes. In our specific case the source view also limits the amount of data to be migrated; if we copied the constituent tables and then set up the view on the destination we'd be copying significantly more rows than the few thousand we're moving now. I'm not sure when it becomes more efficient to move more data than we strictly need, but I imagine it's the sort of thing we could measure and find out when we need to. Thanks again for the idea.
--MIJ
September 9, 2010 at 3:12 pm
I am in same situation but little bit different. My destination servers are sql server 2000, is there any command to disable indexes while inserting into tables?
September 9, 2010 at 7:52 pm
Please open a new thread - you'll get many more people looking at your question.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply