April 18, 2003 at 9:39 am
INFORMATICA load a table in MSS2000 using BCP takes 7 hours for 24 millions rows. The server is set to bulk recovery mode, any help is great
April 18, 2003 at 10:09 am
Doesnt seem too horrible. Have you run Perfmon, see if you're bottlenecked?
Andy
May 14, 2003 at 5:44 pm
I load 50,000 rows in 15sec. Multiplying that out, 24m would be 2 hours (just as a rough guide). Mine is a narrow table though - the total number of bytes is probably more significant than just rows.
If possible, drop foreign keys, triggers, and indexes, except clustered index, and recreate them afterwards.
Is it loading into an emptied table (does it truncate or delete? truncate is better) or does it have to merge the data ? (which bcp doesnt do, it might in fact be loading into a temp then merging data?)
Best if the data file is on a local disk, ideally a different disk to the database table.
May 15, 2003 at 8:01 pm
Beware using TRUNCATE TABLE as that is a non-logged operation.
Knowing what the hardware (particularly number of processors, processor speed, and memory available to SQL) would help in judging whether that is a decent load time or not.
Also, is the table empty or already populated (and you're adding an additional 24 million rows). Row size is important as well. 24 million small rows versus 24 million wide rows makes a significant difference.
The amount of constraints, the presence of triggers, and the number of indexes significantly impact BCP IN and BULK INSERT performance. Keeping these to the bare minimum is essential.
As a guide, we have a process that does BULK INSERTs (db recovery is bulk-logged) into about 20 tables, totalling about 22 million rows. The process used to take 22 hours to complete and after adjusting constraints, indexes, and re-working the SP it now takes less than 6 minutes. That is on a dual proc 1.13GHz PIII system with 2GB of RAM (1.5GB available to SQL). The database resides on local SCSI drives in a RAID5 configuration (which is not ideal for the best write performance).
David R Buckingham, MCDBA,MCSA,MCP
May 16, 2003 at 6:30 am
quote:
Beware using TRUNCATE TABLE as that is a non-logged operation.Knowing what the hardware (particularly number of processors, processor speed, and memory available to SQL) would help in judging whether that is a decent load time or not.
Also, is the table empty or already populated (and you're adding an additional 24 million rows). Row size is important as well. 24 million small rows versus 24 million wide rows makes a significant difference.
The amount of constraints, the presence of triggers, and the number of indexes significantly impact BCP IN and BULK INSERT performance. Keeping these to the bare minimum is essential.
As a guide, we have a process that does BULK INSERTs (db recovery is bulk-logged) into about 20 tables, totalling about 22 million rows. The process used to take 22 hours to complete and after adjusting constraints, indexes, and re-working the SP it now takes less than 6 minutes. That is on a dual proc 1.13GHz PIII system with 2GB of RAM (1.5GB available to SQL). The database resides on local SCSI drives in a RAID5 configuration (which is not ideal for the best write performance).
May 21, 2003 at 6:50 pm
quote:
Beware using TRUNCATE TABLE as that is a non-logged operation
It will not fire on-delete triggers.
Some people think non-logged means non-transactional. I believe it logs the de-allocation of the extents. I have confirmed it you can do a rollback of it with a transaction.
July 19, 2006 at 9:37 am
Ten Centuries posted this...
------------------------------
Personally I prefer to flush out a table and repopulate in preference to DROP/CREATE.
For big import jobs I tend to follow the procedure below.
* Drop DRI constraints (if they exist).
* Drop primary keys.
* Drop indices.
* TRUNCATE tables.
* Import new data
* Recreate primary keys and indices
* Recreate DRI (if required).
By dropping the indices and primary keys it speeds up the import because the server isn't trying to maintain the indices whilst it is at it.
------------------------------
don't know much about it, but the keys at least might
be pulled from the information_schema, then you would simply have
the keys recreated after import has completed.
i've always wondered how this can be done just prior to import
with tsql.
any tips from you pros out there?
_________________________
July 24, 2006 at 7:53 am
There are a number of things I'd recommend, many of which have already been touched on here, but I don't think the ideal combination has been suggested yet:
1. The Command
==============
First of all, use BULK INSERT in preference to BCP. BCP is a much older, less well-optimised command that uses DB-Lib. Microsoft recommend using BULK INSERT in preference.
2. Existing Data
================
Wherever possible, remove existing data from the target table first. This has particular reference to existing indexes (see below). To remove data, TRUNCATE is fastest, but can cause issues as Clay G has mentioned. It will also screw-up your IDENTITY values should you have any.
3. Indexes
==========
Put simply, indexes will usually slow down bulk copy. This is partially due to index pointer updates whilst the data is being added, but also because bulk copy operates in LOGGED mode if indexes exist on the table. Without indexes, bulk copy can operate in NON-LOGGED mode, which enables it to be considerably faster. (there are other considerations besides the presence of indexes that influence this, however).
Indexes do not always slow down the copy, however. If you drop all non-clustered indexes, you will suffer no penalty from leaving the clustered index as long as you specify the ORDER hint as part of the BULK INSERT command. The columns in the ORDER hint must be the same as those in the clustered index, and in the same order. If the data in the source file is also physically sorted in this order then you will gain even more benefit. The overall benefit will be much greater than that of dropping the clustered index and then rebuilding it completely after the import has completed.
4. LOGGED / NON-LOGGED Mode
===========================
Always try to get your bulk copy operation to run in NON-LOGGED mode where possible - it is much faster than the LOGGED mode. If you can sacrifice a little bit of point-in-time recoverability, put the database in BULK_COPY recovery mode. This is one step needed to allow NON-LOGGED bulk copy to take place. In addition, you will also need to specify the TABLOCK hint. Finally, ensure the target table is empty. If it is empty, then the presence of the indexes will not stop a NON-LOGGED mode being used. If the indexed table contains data when the copy starts, then the slower LOGGED mode will be used.
The downside of using NON-LOGGED mode is that you can't do point-in-time recovery, but in my experience this is rarely a consideration with a bulk copy process - you would simply go back to square one and start again.
5. DRI Constraints and Triggers
===============================
There is NO need to drop DRI constraints or Triggers. By default, bulk insert will not check constraints or fire triggers. However, these structures will continue to work normally for all other concurrent database operations (INSERT, SELECT, UPDATE, DELETE). If the constraints and triggers are physicall removed or disabled, you run the risk of exposing other sessions to integrity problems.
6. Batch Size
=============
Microsoft recommend using as large a batch size as possible. This is fine, especially if you are using NON-LOGGED bulk copy. If you are using LOGGED bulk copy (because you can't meet all the requirements for NON-LOGGED), then you may find that a large batch size causes your transaction log to grow too large. You'll need to experiment and find the right combination of batch sizes and log truncation rates to suit your own system.
So, my own strategy list would be:
- drop NC indexes
- truncate (or delete) data from target table
- put database into BULK_COPY recovery model
- import new data using the ORDER hint, TABLOCK hint
- put database into FULL recovery model
- recreate any NC indexes
July 25, 2006 at 4:31 am
make sure you spread your data across arrays - I assume your import data is local to the server so make sure its not on the same array as your mdf file. Make sure your ldf file is on a seperate array. I've found with moving data into sql server the auto growths of ldf and mdf files slows things down big time - try to ensure that none of these have to autogrow during your routine. ( in a fairly large complex data archive process ( 40gb of data ) growing the ldf and mdf prior to operation reduced execution time by around 75%. )
I've sometimes found that updating table stats can aid performance, especially when removing data from a table prior to a new load. The whole issue of stats etc. is a whole new subject matter .. but you might want to consider making sure auto-update and auto-create stats are off during your data load, the last thing you want is any stat updates or creation during loads. Oh don't assume you don't have stats as you've dropped the indexes - if stats still exist then you may run into update issues with them.
run this script to check for auto stats
select name from dbo.sysindexes where id=object_id('yourtable')
and name like '_WA%'
If there are any you might want to delete them!
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply