July 22, 2022 at 12:37 pm
I have an SSIS package that's looping thought a folder of Excel files and loading them into SQL Staging tables.
About 20% of the time, the package fails (running in VS2017) with this error
"Error: 0xC0202009 at PackageName, Connection manager "Excel Holding File": SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80004005 Description: "External table is not in the expected format.".
Now, at first I thought it was a corrupted file, but if I rerun the package, it will load the same file successfully and seems to randomly fail on different files, which on the next run will load fine. In my set of about 10 test files, all have at some point failed, and also at some point loaded. Not sure where to start with this.
I thought at first that the data flow that read the excel file was loading too fast, perhaps while the subsequent script task which copies the file into the load folder still had a lock on the file. I tried putting a 5 second wait in between the copy and data flow but that did nothing. Could it be network issues? If so, and this is unavoidable, should I just use SQL Agent to retry the SSIS job step on failure, or is there a way of handling this in the package?
July 22, 2022 at 12:45 pm
That's a very common error, so don't expect it to correlate well to whatever the problem is!
Are you trying to read anything in parallel, eg, two or more sheets in the same workbook at the same time?
Try putting a 5s delay between reading one spreadsheet and the next, in an attempt to determine whether timing is the issue.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 22, 2022 at 12:48 pm
The file is moved from an input to a staging folder
A Data Flow (the bit that randomly fails) the reads the file into SQL
The file is then moved from staging into an archive folder and the process loops to the next file
As I said, I already tried placing a 5 second wait in the script task which copies the file into the staging folder prior to the data flow firing and that did nothing.
July 22, 2022 at 12:50 pm
Oh wait, I think I only paused for 5ms and not 5secs. Will test again.
July 22, 2022 at 12:53 pm
Does it with a 10 second delay.
July 22, 2022 at 12:59 pm
Sorry, I missed that.
Does your script task use the connection manager? Are you disposing it?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 22, 2022 at 1:02 pm
No, the script that puts the file into the folder from which the data flow loads it is simply testing for a folder existence, creating a folder if not present, then using System.IO.File.Copy to copy the excel file in, and sets some DTS variables. Nothing references the Excel connection until the Data Flow task itself.
July 22, 2022 at 1:13 pm
Then it's a puzzle. You could try setting DelayValidation = True on the CM.
You could also check the All Executions report to look for any other clues. Does the error always occur at the same point in the loop?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 22, 2022 at 1:18 pm
The error always occurs in the Excel Data Source at the start of the data flow. Only ever there, but which file it fails on is random. Often it will do all files in the folder, sometimes it will fail on the first, sometimes the 2nd, or the 5th, totally random. Rerunning usually sees all files loaded.
Delay Validation is already on the Excel CM as the connection string is dynamic and passed in from an outer loop for all .xlsx files in the input folder.
I've not yet deployed to SQL Server so no All Executions report available. I'm just looking at the Output pane in VS.
July 22, 2022 at 1:29 pm
Nothing else comes to mind. It feels like it's something outside of SSIS ... could some other process be locking the file temporarily? A/V software or some sort of folder sync software?
Hopefully, someone else has some additional ideas for you. Annoying problem to have on a Friday!
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 22, 2022 at 2:31 pm
Could it be Microsoft itself? Data transfer between Excel and anything else, including MS SQL server, is a game of chance. It may work, it may not, randomly. Kind of like Black Jack. 🙁
Zidar's Theorem: The best code is no code at all...
July 22, 2022 at 3:07 pm
I've just changed my staging folder parameter to point at my local C drive and so far it hasn't failed once even with me removing the wait script command.
Definitely looking like some sort of security software or perhaps the DFS Replication on the network drives causing the issue.
Will talk to the infrastructure team about setting an exemption.
Thanks for your help.
July 22, 2022 at 5:01 pm
The error always occurs in the Excel Data Source at the start of the data flow. Only ever there, but which file it fails on is random. Often it will do all files in the folder, sometimes it will fail on the first, sometimes the 2nd, or the 5th, totally random. Rerunning usually sees all files loaded.
Delay Validation is already on the Excel CM as the connection string is dynamic and passed in from an outer loop for all .xlsx files in the input folder.
I've not yet deployed to SQL Server so no All Executions report available. I'm just looking at the Output pane in VS.
My suspicion before all else would be VS itself. I don't use VS but that might also explain why I've never encountered such an issue.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2022 at 6:14 pm
If it is in a for each loop, you can make it try again on failure.
There are instructions here: https://www.mssqltips.com/sqlservertip/3575/continue-a-foreach-loop-after-an-error-in-a-sql-server-integration-services-package/
The key points are:
I set the on-failure path to record the retry and empty the destination table before restarting the dataflow, this prevents duplication if the error occurs during the data-flow, and I can see how often it happens. The errors I encounter are dropped connections to Oracle, and I get 5+ errors per failure. If you get a single error then you can set the MaximumErrorCount to 3 so it will retry twice, but not keep going forever.
July 25, 2022 at 8:24 am
If it is in a for each loop, you can make it try again on failure.
There are instructions here: https://www.mssqltips.com/sqlservertip/3575/continue-a-foreach-loop-after-an-error-in-a-sql-server-integration-services-package/
The key points are:
- Set MaximumErrorCount property of the data-flow > 1
- Set the system variable [Propagate] to False while creating an empty on-error event handler for the data-flow.
- On error go back and try again.
I set the on-failure path to record the retry and empty the destination table before restarting the dataflow, this prevents duplication if the error occurs during the data-flow, and I can see how often it happens. The errors I encounter are dropped connections to Oracle, and I get 5+ errors per failure. If you get a single error then you can set the MaximumErrorCount to 3 so it will retry twice, but not keep going forever.
I have implemented option 2 listed on that link you posted. That option simply sets prorogation to False on the Data Flow, and adds error logging to the on failure. I've got this working now and when a file fails, my loop continues to the next file, copies the "bad" one back to the input folder, and logs the failure.
So, two questions.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply