August 9, 2012 at 7:28 am
Hi All,
I read in several discussions that the foreach task is not intended to run over large datasets.
Now I'm trying to update customer data over more than 1 million customers. My design is as follows:
- I'm using an Execute SQL Task to read the customer_Id
- I use the resultset as the input to the ForEach Task
- In the Foreach Task I'm using a couple of Data Flow task to update data (one to update SUM columns and one to update COUNT DISTINCT columns)
This works but is extreamly slow. I have to perform millions of updates (around 30MM).
Any aleternative design?
Your commnents will be appreciated
Kind Regards,
Paul
August 9, 2012 at 8:18 am
Ok,
This question is very stupid, sorry for this.
The solution is very basic, I don´t need to select all the customers and then iterate one by one to perform the data flow tasks. I can select all of the customers inside the data flow and perform all the operations that I need to the whole dataset, that´s one of the greatest features of SSIS, isn´t it?
Kind Regards,
Paul
August 9, 2012 at 8:29 am
Hi Paul,
Can you explain that a little more or provide a link to your solution idea? I am attempting something similar in that I am running a Foreach Loop on a table of about 400 rows and it is taking a long time. I am liking SSIS but am unsure of all its capabilities.
Regards:
Mordred
Keep on Coding in the Free World
August 9, 2012 at 9:20 am
Ok, I will try 🙂
First I load a table "A" with transactions from a particular week. Then I want to update a table "B" which cotains aggregated data from "A". I don't want to recreate the table, only updates records for the customers in the new transactions. For instance, I keep track of the purchases amount in the table B, so I would like to add the new amount for each customer. I also want to update the number of different departments (from a particular store) where every customer has purchased.
Table A contains one record for every item a customer purchase. Example
Table A
CustomerId - item --- quantity --- department -- amount --- date
---- 002 ------ Book ------- 2 ----------- Library -------- 20$ --------- 01.08.2012
---- 002 ------- CD -------- 1 ------------ Music ---------10$ --------- 01.08.2012
---- 002 ------- CD -------- 1 ------------ Music ---------12$ --------- 04.08.2012
.... ..... ... ...... .... ...............
Table B
CustumerId Count(distinct department) SUM (amount) Max(date)
---- 002 ------------------ 2 ---------------------- 42$ --------- 04.08.2012
My first approach was to query the distinct Customers from table A. Then use the resulset as the input for a Foreach container. Then for every row (Customer_Id) perform two data flows:
1. DTS: Select the data from the Table A and current customer. Aggregate the data, lookup the customer in Table B. If the customer is in Table B update the SUM (like total amount) fields, if not create a new entry in the Table B for the customer.
2. DTS: Select the data from the Table A and current customer performing the Count Distinct and Max operations. Then update the Table B.
That works good but the performance is terrible. Then I realized that I can query the whole data in Table A (for a particular date range, I mean, the new weekly records), make a lookup with the Table B (In fact is a Left Joint). From this leftjoin you should obtain something like this:
Aggregations from Table A Left Join Table B on Customer_Id
Aggr.CustumerId Aggr.SUM (amount) B.CustomerId B.Total_Amount
------- 002 ---------------- 42$ ------------------ 002 ------------- 3250$ --------
------- 125 ---------------- 58$ ------------------ NULL ----------- NULL ----------
.............. ..................... ........... .......................
Then you are almost done! Just use a conditional split evaluating the B.CustomerId:
- If null is a new customer, then insert the data from Aggr. in Table B
- If exist, add Aggr.SUM(amount) + B.Total_Amount and update this row with the new values.
Hope you can understand my explanation 😉
Kind Regards
August 9, 2012 at 9:43 am
Thanks Paul, I'm going to look into querying the whole table and see what I can do. Thanks for your explanation.
Regards:
Mordred
Keep on Coding in the Free World
August 9, 2012 at 10:28 am
Based on your description update-join seems close to what you want to accomplish. Lookup update-join. Ben Nadel has few good articles on it.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply