July 9, 2007 at 4:05 pm
I have an SSIS package which was designed to enumerate any number of .CSV files in a specified folder,take action on each file, then move the file to another folder based on the outcome of the process.
Basically, I have a ForEach Loop. Within the loop I have a Data Flow component. Within the Data Flow I have a Flat File Source then some other transformation and logging stuff. The Data Flow component has a precedent constraint for success (green arrow) and one for failure (red arrow) which set where the file is moved to (i.e. a success folder or a failure folder). The process then should repeat for the next file in the folder. Ultimately I need all the files in the folder to be moved from the original location to either the success folder or failure folder.
The problem comes when I have an "improperly" formed .CSV file (or at least that's the scenario I'm currently stuck on), such as embedded quotes. I have no control over the source file so changing the delimiter is not an option. The Data Flow component fails with the error --[Flat File Source[1]] Error: The column delimiter for column "Comments" was not found.-- as anticipated, sends the file down the precedent constraint for failure (red arrow), and moves the file correctly to the "failure" folder. Then the whole process shuts down without looping to the next .CSV file in the originating folder even though there may be one or more files still needing to be processed. The ForEach loop will loop correctly through the .CSV files until it hits one of these "problem" files.
Obviously I'm missing something but what? How do I get the ForEach loop to continue on to the next file? I apologize if this has been answered previously but I've been searching here for a couple weeks and I'm under the gun to get this finished. I hope someone can direct me to a previous response, a KB article, or other reference on this. Thanks in advance.
July 10, 2007 at 1:39 pm
I'm sticking my neck out here, Janice, but I have some thoughts...
1. What other types of errors would cause shipping a file to the failure folder? If there aren't any, then I suspect you need some additional processing controls to address failures.
2. What happens when you move the file to the failure folder? Do you have any housekeeping you're overlooking? purging buffers? is there a memory management issue, where the data flow component is in the wrong order with the rest of the declarations?
3. Is it possible the Data Flow component has triggered a second error that is not being handled, and causes the process to end? Sometimes, having an error handling routine disguises other errors?
I don't normally fish in these waters, but I hope these thoughts help you with directions to go in. It sounds like you DON'T get an error message before the process ends, and that's what led me to #3, which my gut suggests may be your best bet to explore, of the 3 thoughts...
July 10, 2007 at 4:47 pm
Thanks for trying Steve.
So far the only failure scenario I'm dealing with is an improperly formed .csv file. As stated previously I don't have any control over the .csv source. Also, my test scenario is only 2 .csv files with each having only 1 record (one is formatted correctly, the other has embedded quotes) so memory doesn't seem to be a problem.
The goal of the package is to iterate through all the files that are deposited in a receiving folder. A Data Flow Task contains a Flat File Source to access the first .csv file. If successful (the .csv is formatted correctly) basic information about the file is logged (date/time, file name, number of rows in the file), the Data Flow Task shows success, and a File System Task moves the .csv file to a success folder. If the Flat File Source fails (the .csv has a field with embedded quotes or with a carriage return), the Data Flow Task shows failure, and another File System Task moves the .csv file to a failure folder so it can be cleaned before being passed to the next process. The Data Flow is placed inside a ForEach Container set for file enumeration so in both cases, the ForEach loop should then kick in to grab the next file in the receiving folder. Unfortunately once a badly formatted .csv file is handled the entire process stops, the ForEach loop doesn't grab the next file.
I'm really stuck here. If I manually remove the embedded quotes (or the carriage return) from all the files prior to processing, the ForEach loop correctly loops through all the files and moves them to the success folder. But if I don't do that preprocessing, the ForEach loop stops after moving the badly formatted .csv to the failure folder. The package will ultimately be started by a SQL Agent Job, on a specified schedule. There will be way too many files arriving 24/7 for preprocessing to be a feasible work around. I hope someone can please help.
July 11, 2007 at 6:21 am
There should be a Maximum Error Count property you could increase for the package, for/each loop and the data flow. I know packages fail when this limit is hit, which is defaulted to 1.
The data flow also has advanced properties which you could set to ignore errors. You would be flying blind for unanticipated errors though.
September 4, 2007 at 2:04 pm
Hi Janice,
Did you ever find a solution to your scenario. I am facing the exact problem and have been searching the net for a while for a solution but have yet to find one. So I am hoping you have some pointers for me.
Thanks a bunch,
Calvin
September 5, 2007 at 6:48 am
Unfortunately, no. I'm still struggling with this. I just ordered some training materials from AppDev and am hoping that by going back to the basics I'll find what I'm missing. Will post back if/when I do.
September 5, 2007 at 6:55 am
That's kinda why I use BCP for this type of stuff... you can program BCP to survive just about any error and squirrel the "bad" rows in a separate file where they can be repaired and then imported.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 5, 2007 at 7:26 am
Click on the component you are receiving the error on (I think you said it was your data flow) and click the Error Handlers tab. Create an error handler to move the file to the error directory. Then, set the maximum number of errors allowed to a number higher than 1. The error handler will allow your loop to continue rather than using the error flow in the control flow interface.
This makes the workflow hard to follow (moving the error handling onto another tab), but it will allow you to handle an error and resume where the control flow left off. That is what the error handler tab was created for.
September 5, 2007 at 7:52 am
Since I don't use DTS or SSIS for this type of stuff... what happens to the data that caused the error? Can the error handler be made to save the bad rows in a file? Thanks, Michael...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 5, 2007 at 8:10 am
Jeff, each SSIS task allows for a dual chain of events to be processed from the task; 1 Path for success/completion/expression test, 2 Path for failure.
It seems like this is happening properly per the initial question. "If the Flat File Source fails (the .csv has a field with embedded quotes or with a carriage return), the Data Flow Task shows failure, and another File System Task moves the .csv file to a failure folder so it can be cleaned before being passed to the next process. "
The individual records can be split along the success/failure paths also, but this would have to occur in the data flow task.
Janice, each task has a maximum error count which is defaulted to 1. You can alter how SSIS data flow tasks reacts to an error, (fail/ignore), but you have to increase the task's maximum error count to something greater than 1. You also have to increase the maxuimum error count on all parent tasks. For example, a for/each loop task is calling a data flow task. If an error occurs in the data flow task you would have to increase the maximum error count for the data flow task, it's parent for/each loop task, and the entire package as well, since it is at the top of the hierarchy.
September 5, 2007 at 8:48 am
Jeff, the answer is yes. You can save the errored rows or files anywhere you want. The error handler is actually another complete data flow per error handler you create allowing you to do pretty much anything in the error handler you can in a regular data flow.
It does become hard to follow pretty quickly if you over-use the error handlers.
September 5, 2007 at 7:00 pm
Cool... thanks guys.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 6, 2007 at 3:32 am
Guys,
Thank you very much for the replies. Based on your responses, I was able to modify my package and get it to successfully move the bad file/s to an archive folder and continue processing the remaining files.
My problem was the MaximumErrorCount property. I needed to set it to greater than one on the parent containers. Then everything worked like a charm.
I was pulling my hair on this for some time now. So thank your very much for valuable input.
Cheers,
Calvin Brown
September 6, 2007 at 6:30 am
Yeah, it seems SSIS wants you to anticipate your errors, which is ok if we know what they might be.
I have a couple issues with SSIS and how these maximum error counts work.
1. If you set the maximum error count to 100 and you have 101 bad items, the packages stops. How do you know what to set this value at? Trial and error?
2. If you try to recreate a package from scratch, you have to click on every task to check almost every property. MaxError, Delay Validation to name the two most problematic. If you copy and paste a task, you have to go back over every property, and the settings can be spread over different property pages (properties, edit, advanced editor, expressions, parameters. (right-click here, right-click there). ugh!
3. I might not want all errors to be treated equally. bad files, stop; bad records, parse differently and continue.
I'm going to have to find the error values in SSIS and then programatically test on them in an attempt to CASE out the error routing. This seems to be the real challenge down the line with this software.
September 6, 2007 at 7:51 am
I’ve been struggling with the same problem in a very similar situation. I was finally able to prevent the ForEach Loop from failing the package, without setting the MaxError property to a value greater than 1, by setting the Event Handler Propagate variable to False. I elected to keep Event Handler empty and just add the File System Task on an error flow in the Control Flow.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply