August 3, 2011 at 7:58 pm
Can somebody tell me what issues one can face when trying to load the data into one table using multiple packages at the same time? and also if solutions to that 😉
Thanks in advance..
August 3, 2011 at 10:18 pm
One typically faces locking issues.
I would suggest serialising the packages.
Even if your packages are running serially they can still have problems if you are using transactions. Locks are not released until the transaction ends.
Each source/destination creates a lock in order to validateExternalMetadata (this is part of the validation phase before execution) but this can be turned off by setting the validateExternalMetadata property to false.
Best advice is to avoid it. Consider instead using a staging process so that you only write changes to the table once.
August 5, 2011 at 9:14 am
If you serialise the packages, you get a big performance hit as each one has to wait till the preceding one completes.
A better method for fast loading is to logically partition your packages so they are less likely to conflict with each other for locking.
There is a lot to take into consideration with this method.
Eg. you could have 12 packages that write to the table, each package is set to only load data for a particular month, so package 1 loads January data for example. If you physically partition your table by month as well then you can minimise locking problems and parallelise the loading too.
I am of course presuming that you are trying to load into one table from multiple sources to improve loading speed? If your doing it for any other reason then you should be reevaluating the design of your ETL process.
Its far more common, as sam says, to stage your data, transform it. Then write it out once to destination.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply