July 6, 2009 at 2:02 pm
While most of the files we receive have a carriage return and line feed at the end of each row, some files only have line feed and some files only have carriage return.
We receive the files from many sources. Sometimes the same source will change the end of row markers from one run to the next.
Is there a way to identify the end of row marker used on the first line without having to read the entire file? Some of these files contain over a million rows. At least so far they have been consistent within a file.
I tried using StreamReader in a script task but it breaks the file into rows whenever it hits a carriage return, a line feed, or a carriage return immediately followed by a line feed and does not return those characters. It is too bad the flat file connection manager does not give us that option!
Another approach I tried is to define a flat file connection manager as fixed width with only one field whose length is greater than the real length of the row. This will break the file up into rows where each row has that length. It appears to ignore carriage returns and line feeds. I then add a row counter followed by a conditional split that drops rows after the first one.
I am then able to search the first row for the location of any carriage returns (chr(13)) or line feeds (chr(10)). The drawback to this approach is it reads the entire file.
Currently, I am using the second approach if the package fails or if there only appears to be one row in the file. Then I send an email with the results.
July 6, 2009 at 2:35 pm
Inconsistent file formats are a major problem.
What you might be able to do is set up a CLR script that checks the file for format, and picks different data flow objects based on what it finds. That might work. Should be pretty easy to set something up that searches the first X characters (some number larger than the largest row you ever get) and looks for the various patterns, just using a text file reader object. Then have that pick which data source definition to use.
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 6, 2009 at 7:55 pm
Thank you. I am unfamiliar with CLR scripts. Could you point me to some documentation on them?
As for using a script task, the closest I have come is the following but I don't think it will work with a million row file. It does return the carriage returns and line feeds.
Public Sub Main()
Dim ffConnection As String = _
DirectCast(Dts.Connections("InputFile").AcquireConnection(Nothing), _
String)
Dim fileReader2 As String
fileReader2 = My.Computer.FileSystem.ReadAllText(ffConnection)
Dim ctr As Integer
ctr = InStr(fileReader2, Chr(13)) 'carriage return
MsgBox(ctr)
ctr = InStr(fileReader2, Chr(10)) 'line feed
MsgBox(ctr)
Dts.TaskResult = Dts.Results.Success
End Sub
July 7, 2009 at 7:14 am
I haven't done it in years, but I know that VBA and VBS have a FileSystemObject that allows you to read lines from the file into memory. Works for reading text files. Usually, you set it up with a loop that looks for an end-of-file marker and have it run till it finds that, but in this case, you'd just set it up to run a time or two.
I'm not familiar enough with .NET to know how to do the same thing in them, but it's a pretty sure bet that anything VBA can do, VB.NET and C#.NET can do, probably better.
If you have access to a .NET dev who can help directly, ask about that idea. If not, then maybe someone who knows more .NET than I do will hit this forum (there are quite a few who are here regularly), or you could ask on a forum that specializes in .NET.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 7, 2009 at 9:59 am
You might find some help in this article[/url]. It does not directly address your issue, but it does deal with using a Script Component to read a file line by line.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply