May 25, 2010 at 6:49 am
I am trying to implement checkpoints in my package. From the documentation I have read it should restart a control-flow task at the point where it failed. If I have an actual control-flow task that is something like a Execute SQL task, it works like expected, picking up at that task when restarted.
In the case of a control-flow task which has one or more data-flow tasks in it, a different behavior emerges. My understanding is that if I use a Sequence Container, at the control-flow level, and have the tasks inside the container set to FailParentOnFailure, with the Sequence Container set to FailPackageOnFailure, when the package is run again, after failing somewhere in that sequence container, it will start with that sequence container. This is true, in the most technical of terms, as it does start with that Sequence Container. Unfortunately it starts with the control-flow task in the Sequence Container that failed, not at the beginning of that Sequence Container.
To illustrate further. My Control Flow tasks originated as:
Create Tables (TSQL)
GetJobKey (stored procedure)
Bulk Load Data into Staging table. (Data Flow)
Bulk Load data from Staging table to Header Table (Data Flow)
Parse Staging table data to obtain detail records and populate detail tables. (Data Flow with script and split)
Update Audit table
Delete Staging data
If my package died in the middle of any data flow, including if I just stopped the package there, that data flow was where the package started the next time it ran. The problem is that if it had already inserted a few records, anywhere up to 9 million, those records where there already and the task began inserting the records from the beginning, giving me duplicate data.
In an attempt to fix this, I took each data flow task and put it into a Sequence Container, and added an Execute SQL task which truncated the table(s) involved, the end goal being if anything in the sequence container failed it should fail the Sequence Container and upon restart, the package should pickup at the beginning task inside the Sequence Container.
New flow
Create Tables (TSQL)
GetJobKey (stored procedure)
Sequence Container
(
Truncate Staging Table
Bulk Load Data into Staging table. (Data Flow)
)
Sequence Container
(
Truncate Header Table
Bulk Load data from Staging table to Header Table (Data Flow)
)
Sequence Container
(
Truncate Detail Tables
Parse Staging table data to obtain detail records and populate detail tables. (Data Flow with script and split)
)
Update Audit table
Delete Staging data
This is not how it works. If my data flow task, inside the Sequence Container fails the package picks up at the data flow task, not the Truncate Tables task that come before it in the Sequence Container.
The entire package runs in serial. There is no parallel processing. The only reason for the truncate tables is to maintain data integrity in the event of a restart, as the first task in the package actually creates new tables to put all data into.
Anyone have any advice on how to fix this?
Thanks,
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
May 25, 2010 at 2:01 pm
Further research on the Internet has revealed that you must use a workaround, which is putting the tasks you want to work as one unit, as far as checkpoints go, into a For Each Loop Container. The container only has one value to loop through, and this means it only goes through the loop once.
After I put all of my tasks into one of these, ending up with three loops in the package, the checkpoints worked as I expected them to with the Sequence Container grouping.
Thanks to "SSIS Junkie" for the info.
Chris
Learning something new on every visit to SSC. Hoping to pass it on to someone else.
May 26, 2010 at 1:28 am
Thanks for the info.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply