August 1, 2011 at 1:58 pm
create table #drivertable(
id int,
name varchar(10),
status varchar(2),
c_group int
)
Insert into #drivertable
select 1,'X_temp1','',1
Insert into #drivertable
select 2,'X_temp2','',2
Insert into #drivertable
select 3,'X_temp3','',1
Insert into #drivertable
select 4,'X_temp4','',3
Insert into #drivertable
select 5,'X_temp5','',2
Insert into #drivertable
select 6,'X_temp6','',4
Insert into #drivertable
select 7,'X_temp7','',5
Insert into #drivertable
select 8,'X_temp8','',2
Insert into #drivertable
select 9,'X_temp9','',1
Insert into #drivertable
select 10,'X_temp10','',3
Insert into #drivertable
select 11,'X_temp11','',1
Insert into #drivertable
select 12,'X_temp12','',2
Insert into #drivertable
select 13,'X_temp','',7
Insert into #drivertable
select 14,'X_temp13','',6
Insert into #drivertable
select 15,'X_temp14','',7
Insert into #drivertable
select 16,'X_temp15','',6
Insert into #drivertable
select 17,'X_temp16','',5
Insert into #drivertable
select 18,'X_temp17','',2
Insert into #drivertable
select 19,'X_temp18','',1
Insert into #drivertable
select 20,'X_temp19','',3
select c_group,count(*) from #drivertable
group by c_group
Create table #log
(
sid int,
name varchar(10),
flag varchar(2)
)
select * from #drivertable order by 4
I have created an ssis package and i have 7 data flow tasks in it ...
Group 1,2 ,3 have no dependencies but group 4,5,6,7 has dependency on group 3...
I have no issues with dependency as of now...
we have a logging added to this package once the table is loaded we update #log table with flag Y or N for that respective table.
we have sid generated unique for each time we run so lets say the first time it runs it is 1 and it get incremental with 1
My issue here is i run the package and lets say the package fails in group 3 and group 1 & 2 are loaded succesfully.
the next time i want to run the tables which arent loaded before or errored out...i cannot have checkpoints as it has parallel
exxecution
This is my approch to this scenario...
I have created a table laod with column name flag and flag1 if flag - Y then run all tables from beginning
else
get the rest of the tables and run it...
this is the query build for single group 1
--If i want to load all the tables from beginning again
declare @flag varchar(2)
select @flag = flag from Load
If @flag = 'Y'
--load all tables
Begin
SELECT name FROM #drivertable
where C_Group = 1
end
Else
Begin
declare @flag1 varchar(2)
select @flag1 = flag1 from load
Begin
If @flag1 = 'Y' -- then follow below process
--this is where i am having issues if it fails more than 3 or 4 times the process doesnt work ...
BEGIN
select * from (
select name
FROM #drivertable a left join
(select * from #logs where sid= ( select max[sid]-2 from #logs ) ) b
on a.name = b.name
where (b.flag is null or b.flag in ('F'))
and a.Cont_Group = 1 ) FST
left join
( select name
FROM #drivertablea left join (select * from #logs where sid= ( select [sid]- 1 from #logs ) ) b
on a.name = b.name
where (b.flag in ('Y'))
and a.Cont_Group = 1 ) SND
on FST.name = SND.name
where SND.name is null
END
ELSE
---if the package fails first time then it rettrievd the table based upon the below logoc if it fails
--more than twice then we update the flag1 as Y and run the above logic
( select name
FROM #drivertablea left join (select * from #logs where sid= ( select [id]-1 from #logs ) ) b
on a.name = b.name
where (b.flag is null or b.flag in ('F'))
and a.Cont_Group = 1 )
END
END
END
-----------
My basic requirement is when the group 1 fails and group 2 is success loading all tables and when i re-run i dont want group 2 to load again and group to run only those failed tables ...please do help me
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
August 1, 2011 at 2:09 pm
Why not just do it like this instead:
Have a table that stores a log of each of the 7 data flow tasks for a given day. In this table you would have 9 columns. One column would be an identity PK, one would be the date, and the other seven would be bit fields indicating whether the respective data flow task completed successfully on that date.
In your SSIS package, you would then first create a new record for the instance you are running, and fetch the ID value for that instance into a variable. This ID would be passed to each of the individual data flow tasks. You also want an ID value for the previous instance that you ran it, and the status of the previous instance. If one task failed, the status would be "fail", otherwise it would be "success".
In each data flow, the first thing you check the record with the ID value for the previous instance. If there were some records which did not succeed, then you check if that particular data flow task failed. If it did fail, then you re-run it. If it did not fail, then you skip it. At the end, you log the success status of that task.
So as a kinda pseudo-code:
1 - Get ID of Previous Instance and status --> @PrevInstanceID, @PrevInstanceStatus
2 - Create new Instance, Get ID of New Instance --> @NewInstanceID
3 - For Each Dataflow Task
3a - If @PrevInstanceStatus = FAIL
3a1 - If Current Dataflow Task in @PrevInstanceID = FAIL
3a1a - Execute Task
3a1a - Store Success Value in @NewInstanceID Record
3a2 - If Current Dataflow Task in @PrevInstanceID = SUCCESS
3a2a - Store SUCCESS Value in @NewInstanceID Record
3b - If @PrevInstanceStatus = SUCCEED
3b1 - Execute Task
3b2 - Store Success Value in @NewInstanceID record
August 1, 2011 at 2:17 pm
Kramasamy...Thanks for the response but i am not sure if i understand you correctly and also there is an existing process for which i am trying to do enhancement...
Existing Process...
Step1: Assign max(id) + 1 to variable
Step2: get all tables for each group
Step3: process each table and log it into DB and the process goes on...
I want to make changes in step 2 where they try to get the list of tables to be loaded for each group and do it based on flags as i can create new tables if needed and change the logi for step 2...
please do help me in that direction....
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
August 1, 2011 at 2:24 pm
Right, so in each of your data flow tasks, you just need to have a conditional split operator at the start. one half of the split will point towards the entire chain of tasks associated to that data flow tasks. This chain would get executed on one of two conditions.
(if, the previous time the SSIS task was run, one of the tasks had failed, AND if the current task had failed in the last run, then execute the tasks in the data flow task and log success or failure depending on what happens
OR
if, the previous time the SSIS task was run, there were no failures, then execute the tasks in the data flow task and log success or failure depending on what happens)
ELSE
skip the data flow task, and log a success for that task immediately
August 1, 2011 at 2:27 pm
I do not want to run the entire data flow task again ...i just want to process only the tables failed in each task or total data flow task if none of the tables got processed...
Isnt there a way to identify the tables not loaded or failed with the query ...
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
August 1, 2011 at 2:35 pm
That's what I've been saying.
Have a table which stores the status of each data flow task. This table has a column that indicates for a given data flow task, whether the task failed in that particular execution instance.
Then, the first step in your data flow task would be to check and see if there was a failure in the previous run. If there was a failure, and your current task failed, then re-run it and log success or fail. If not, then skip it and log success.
August 1, 2011 at 2:35 pm
i am not sure if we both are on the same page ...sorry for not explaining it clearly...
Step1 : get the ID
Step2: Get list of table for each group
Step3: load it ,check row count and it has many validationad and logging moving further and it is a loop process for this tables in each group...
So i do not want to change anything in step 3 as it is complex want to change step2 where we identify what all tables need to be run
if we have only two tables as output in step only 2 tables will run or all tables then it will load all tables so bsed upon log table i want to get only the tables which hasnt run last time or failed last time ...
So group 1 has 10 tables
First attemp--3 success ,1 failed ,6 --did not run
Second attemp -- i want o get only 7 tables as the list which i am able to do it based on above logic...
If 2 success,1 failed , 4 did not run
Third attempt-- i need to get only 5 tables for this group and if group 2 has got success in second attemo i do not want to run it in third attemp but i am unable to do it ...
I have tried using the logic i posted...
Add flag if Y then ignore all this conditions and run all tables
else run the ones which failed it works fine
if still failed in second attempt then enable flag1 as Y manually for which i am unable to retrived the failed and not run tables ...as i want to retrieve the tables only failed and not run till the total tables are loaded with flag1 as Y...
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
August 1, 2011 at 2:53 pm
Existing logic is if loading any table fails fix it and run the entire package again...
So i want to make enhancements each time we run after failure just run the tables which arent loaded or got failure till it completes loading all tables....
Please tell me ur email id so that i can send the package if ur unable to understand what exactly i am saying...
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
August 1, 2011 at 2:57 pm
yeah i'm afraid that i can't help more without seeing the rest. You can feel free to send me the code, just click my name and select send email. Can't guarantee that I'll have time to really look that much in to it, but I'll see what I can do.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply