November 26, 2008 at 6:09 am
Hi, maybe an easy 1.
I got 2 database, the first has the a table1 with columns 1 - 20 all defined as varchar. In the second DB the same table1 but columns 1 - 20 now is defined into their correct datatypes i.e. numerics, datetimes, char, varchar etc.
My SSIS package has an OLE DB source, then a script component which does the conversion and finally the OLE DB dest again. Everythings works perfectly except that I know I can speed it up with the hardware that I have avaiable.
I have a 8 CPU with 8 GIG ram, but if I look at the SSIS 2005 running, I see that only 1 CPU is used at any time.
To extract from the first table and insert into the second table takes upto 4 hours on the job seeing that I have about 110 mil records in the table.
How do I split the SSIS package in order to start using the other CPU's as well.
(I can use, view, select statements or what ever I want, in order to retrieve the data to insert)
thanks
Chris
November 26, 2008 at 7:12 am
If you really wanted to split the processing into multiple data flows, you could just copy your current data flow and filter the source data so they included mutually-exclusive data sets. If there is no control flow connector between the two data flow components, they will run in parallel.
You may run into issues with this if the destination components do table or page locks. You may actually end up slowing things down.
The number of records you are dealing with should not be much of a problem. I would recommend you really search for the bottleneck before assuming it is a problem that can be "threaded away". The first thing I would check is your script component. Try running just a table copy from one server to the other. If this is really fast, then you may need to spend some time optimizing your code before going in the direction of trying to get your processing to thread.
November 26, 2008 at 7:20 am
Your problem is not that only one CPU is used. Your problem is that you use OLEDB destination component. This is the slowest possible way to insert. What is the destination database? You have to use a bulk-load component and you will easily get 10-30x speed increase.
November 26, 2008 at 7:23 am
Thanks Micheal
Makes sense in what you are saying, but I do think that I should try the parallel approach.
My script component actually does quite a bit of processing seeing that the data is not up to scratch.
But I can only try and see what happens.
thx
November 26, 2008 at 7:36 am
Thanks for the reply Cozyroc
It is from a the same SQL Server 2005 machine, just from 1 DB to another DB (both sql 2005). I've never used Bulk insert task but just quickly checking on it, i see that it is from flat file to DB or can you do tbl to tbl with this component?
Also I could really use this component for my initial load to the first (varchar) table, but currently the speed I getting loading from flat file to tbl is really not to bad, but this component is some to keep in mind.
November 26, 2008 at 7:42 am
November 26, 2008 at 7:55 am
Michael Earl (11/26/2008)
If you really wanted to split the processing into multiple data flows, you could just copy your current data flow...
Just a quick FYI on copying... I have read about quite a few issues, and have experienced a few of my own, when copy/pasting transformations (particularly lookup transformations). It's still unclear to me why it happens, but often when a lookup transform is copied it will generate nonspecific errors while executing. It can be corrected by deleting and recreating from scratch the transform causing the problem.
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
November 26, 2008 at 8:04 am
Hi Tim
Just to add, I've also experienced various problems with copying of components and specially the script component. I've re-installed VS on my PC about 4 times in the past 4 months, all because the the script component giving 'unspecified errors'.
My current solution which is working for more than a month now with out any hick-ups is to NOT install the service packs on Visual Studio. (maybe a solution for these problems)
November 26, 2008 at 8:11 am
CozyRoc (11/26/2008)
You can do DB to DB. There is no limitation. Give it a try. It will probably take not more than 10 mins to setup. Post back your results.
Hi again
When I look at the component I only get flat file as the source connection on the Bulk Insert Task. How do I specify that I want to got from DB Tbl to DB Tbl?
Thx
November 26, 2008 at 8:16 am
I have also had some copy and paste issues.
Copying and pasting in the data flow has been pretty sketchy for me. Copying and pasing components in the control flow (including entire data flows) has usually been ok.
I do like to paste into a sequence container or something that is different than the original location. If SSIS does not have to rename anything, it seems to work a bit better with the paste.
November 26, 2008 at 8:21 am
November 26, 2008 at 8:24 am
I was looking at the Bulk Insert Task which is found on the Control Flow. Got it now, will see what is the difference
November 26, 2008 at 8:25 am
Keep in mind that the SQL Server Destination component will only work if the package is running on the destination server. It is a local-only destination component because of the way it uses the file system to bulk insert.
The OLEDB Destination component has a couple of "Fast Load" modes that are similar to (or may even use, I cannot remember) bcp. Depending on your server layout, you may have no choice.
November 27, 2008 at 12:08 am
CozyRoc (11/26/2008)
Your problem is not that only one CPU is used. Your problem is that you use OLEDB destination component. This is the slowest possible way to insert. What is the destination database? You have to use a bulk-load component and you will easily get 10-30x speed increase.
I know this is going to go against everything that people have experienced, but I get the following results:
Running from OLE DB Source via script to OLE Destination I inserted 137 mil records into the table in 4 hours. (About 9,500 rec/sec) (The original package)
Running a copy of this SSIS package but changing it to SQL Server destination, I inserted 15 mil records in 44 mins. (Only about 6,000 rec/sec).
Everything is the same except for the destination in the second package.
Is there something that I can check to make sure that I get better performance out of SSIS.
Just to add, I dont have a lot of time to play around and try various options, but if you can point me in a direction, it would be greately appreciated.
Chris
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply