Alternatives to foreach loop task

Question

Post reply

Alternatives to foreach loop task

Paul Hernández

SSCarpal Tunnel

Points: 4961
More actions
August 9, 2012 at 7:28 am

#143667

Hi All,
I read in several discussions that the foreach task is not intended to run over large datasets.
Now I'm trying to update customer data over more than 1 million customers. My design is as follows:
- I'm using an Execute SQL Task to read the customer_Id
- I use the resultset as the input to the ForEach Task
- In the Foreach Task I'm using a couple of Data Flow task to update data (one to update SUM columns and one to update COUNT DISTINCT columns)
This works but is extreamly slow. I have to perform millions of updates (around 30MM).
Any aleternative design?
Your commnents will be appreciated
Kind Regards,
Paul
Paul Hernández

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Paul Hernández SSCarpal Tunnel Points: 4961 More actions · Answer 1

Ok,

This question is very stupid, sorry for this.

The solution is very basic, I don´t need to select all the customers and then iterate one by one to perform the data flow tasks. I can select all of the customers inside the data flow and perform all the operations that I need to the whole dataset, that´s one of the greatest features of SSIS, isn´t it?

Kind Regards,

Paul

Paul Hernández

Mordred SSCrazy Points: 2904 More actions · Answer 2

Hi Paul,

Can you explain that a little more or provide a link to your solution idea? I am attempting something similar in that I am running a Foreach Loop on a table of about 400 rows and it is taking a long time. I am liking SSIS but am unsure of all its capabilities.

Regards:
Mordred
Keep on Coding in the Free World

Paul Hernández SSCarpal Tunnel Points: 4961 More actions · Answer 3

Ok, I will try 🙂

First I load a table "A" with transactions from a particular week. Then I want to update a table "B" which cotains aggregated data from "A". I don't want to recreate the table, only updates records for the customers in the new transactions. For instance, I keep track of the purchases amount in the table B, so I would like to add the new amount for each customer. I also want to update the number of different departments (from a particular store) where every customer has purchased.

Table A contains one record for every item a customer purchase. Example

Table A

CustomerId - item --- quantity --- department -- amount --- date

---- 002 ------ Book ------- 2 ----------- Library -------- 20$ --------- 01.08.2012

---- 002 ------- CD -------- 1 ------------ Music ---------10$ --------- 01.08.2012

---- 002 ------- CD -------- 1 ------------ Music ---------12$ --------- 04.08.2012

.... ..... ... ...... .... ...............

Table B

CustumerId Count(distinct department) SUM (amount) Max(date)

---- 002 ------------------ 2 ---------------------- 42$ --------- 04.08.2012

My first approach was to query the distinct Customers from table A. Then use the resulset as the input for a Foreach container. Then for every row (Customer_Id) perform two data flows:

1. DTS: Select the data from the Table A and current customer. Aggregate the data, lookup the customer in Table B. If the customer is in Table B update the SUM (like total amount) fields, if not create a new entry in the Table B for the customer.

2. DTS: Select the data from the Table A and current customer performing the Count Distinct and Max operations. Then update the Table B.

That works good but the performance is terrible. Then I realized that I can query the whole data in Table A (for a particular date range, I mean, the new weekly records), make a lookup with the Table B (In fact is a Left Joint). From this leftjoin you should obtain something like this:

Aggregations from Table A Left Join Table B on Customer_Id

Aggr.CustumerId Aggr.SUM (amount) B.CustomerId B.Total_Amount

------- 002 ---------------- 42$ ------------------ 002 ------------- 3250$ --------

------- 125 ---------------- 58$ ------------------ NULL ----------- NULL ----------

.............. ..................... ........... .......................

Then you are almost done! Just use a conditional split evaluating the B.CustomerId:

- If null is a new customer, then insert the data from Aggr. in Table B

- If exist, add Aggr.SUM(amount) + B.Total_Amount and update this row with the new values.

Hope you can understand my explanation 😉

Kind Regards

Paul Hernández

Mordred SSCrazy Points: 2904 More actions · Answer 4

Thanks Paul, I'm going to look into querying the whole table and see what I can do. Thanks for your explanation.

Regards:
Mordred
Keep on Coding in the Free World

Rajib Bahar Ten Centuries Points: 1067 More actions · Answer 5

Based on your description update-join seems close to what you want to accomplish. Lookup update-join. Ben Nadel has few good articles on it.

Rajib Bahar
http://www.rajib-bahar.com
http://www.twitter.com/rajib2k5
http://www.youtube.com/icsql