June 20, 2003 at 10:56 am
I'm loading a new database with over
1 million rows in one table and updating 220,000 rows in another table. This process was taking 42 hours. With some indexes, the completion time is now down to 25 hours. I'm changing to stored procedures for a little more performance gain.
I know 25 hours is a lot of time to load 1 million records and update 220,000. I don't have much time or resources to research this. This Command-line VBScript will only have to run one time. It is a data conversion from an old Mainframe system. It is not a straight bulk load, there is lots of other processing going on.
My question is about log files and the logging mechanism.
When I started my last test, I checked on the number of inserts in the first 10 minutes of the script. Then I estimated that the script would complete in 8 hours, but it took 25. It seemed to slow down as it went along. There was nothing else running on the server, it ran over the week-end and there was no one here.
Could this be a logging issue? I don't need to roll anything back. I'm going to try this every 100,000 inserts:
/*
* DESCRIPTION
* ===========
* Shrink the Transaction Log file
*/
CREATE PROCEDURE sp_truncate_log
AS
DBCC SHRINKFILE ( UFRS_DEV_Log , TRUNCATEONLY )
BACKUP LOG UFRS_DEV WITH TRUNCATE_ONLY
On another note, this does not seem to actually shrink the log that much.
Any ideas on cleaning up the log file as the script runs?
Thanks
June 20, 2003 at 3:57 pm
could be logs. Have you tried committing in smaller batches? Not one transaction. Also, are there triggers?
Steve Jones
June 23, 2003 at 2:43 am
quote:
It seemed to slow down as it went along.
Could it be an issue with your clustered index? As more records are beeing added the clustered index grows and if you have a wide and/or unique index maybe you are starting to experience huge amounts of page splits and your I/O might not cope? How about doing your inserts (if you are doing single inserts) into a # table without index and now-and-then do a INSERT INTO (..) SELECT (...) to let SS optimize your inserts with respect to I/O use and page division?
June 23, 2003 at 2:58 am
Unless your data is sorted in clustered index order, remove your index before you load and then rebuild it.
Also, I doubt that VBScript is the quickest language to do any conversion. We found the fastest method was to convert the data using C and bulkload using the BCP API, running 7 loads in parallel (8 CPUs).
HTH
Andy
June 23, 2003 at 2:59 am
As long as you are committing in small batches - as Steve suggests - there may be some benefit to setting a recovery mode of "Simple" on your target database (Database -> Properties -> Options). Remember to set it back when you are finished.
This means that the logfile will only retain information for incomplete transactions so there should be no need to shrink the log every so often.
June 23, 2003 at 3:02 am
Forgot to ask if your database has been pre-sized? You may be wasting time auto-extending?
Andy
June 23, 2003 at 12:27 pm
This may be a dumb question, but is DTS an option for you?
Using a text file as the source and an table as the destination can be extremely fast. With straight ActiveX copy columns I've been loading around 700,000 records in about 9 minutes. These are not small records, either....over 150 columns, and our servers aren't that high end.
The table I'm importing to is just a staging table, no indexes and the columns are named exactly as the text file headers (this makes setting up the DTS package super easy). The options I have turned on are "use fast load", "Keep Null Values", and "Table Lock", And I've turned off "Check Constraints".
After the import I add a NONCLUSTERED index to whatever column I'll use later in my joins, then it's stored procedures for all transformations, Inserts, Updates, etc (which you can throttle if you're worried about tran log issues).
Cheers
cl
Signature is NULL
June 24, 2003 at 10:41 am
The only non-clustered indexes are on the tables used as input (select). The tables that have rows updated or inserted have one key field as an index. The table which will have the 1 million rows inserted has an Identity on its key field. If the key field is being incremented by 1 for each insert, I did not think there would be any page splits.
DTS is not an option.
A rewrite to C is not an option. This program is a one time run.
I have not been executing any commits. Would this help? There are no Triggers.
I have to use ADO Recordsets as Cursors to sequentially process the input tables. The Cursors open quickly, but like I mentioned the processing slows down as it goes along.
I don't think the database is pre-sized. We have a very small shop here, (2) Engineers, and no DBA.
Clavin, how can I "throttle" a Stored Procedure?
On another note, my last test ran for 28 hours. There were two changes in this test. One: I put the tables the script uses for input in another database on a different disk controller, and two: I executed the Stored Procedure (sp_truncate_log) every 500 inserts.
When this script had run for about an hour, it had inserted 100,000 rows. In the end it was 28 hours for 1,100,000 rows. I was hoping for 11 hours.
Thanks for all of the suggestions and feedback.
I'm going to execute a Select every 50,000 inserts, change the recovery mode to "simple."
Would it help to create the database with larger allocations? It's a new DB for a new application, so I have this option. Right now the Data file is 208mb used and 218mb free, the Log file is 796mb used and 367mb free.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply