Whats the easiest way to read the first 10 characters of a flat file

  • Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

    So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

    SSIS 2012.

  • Powershell would work.

    You could read first line until crlf - that would give you the format of your file, then use some string processing in Powershell to build some type of "logic" to assess the type of "data file format"

    --------------------------------------------------
    ...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully 😀

  • pmcquirk 96901 (5/6/2013)


    Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

    So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

    SSIS 2012.

    You could also use a script task, as you suggest. Here is a link that should get you started, should you choose to go down that path. Within the script task, you would also, of course, set the package variable which you would use in subsequent precedence constraints to select the relevant data flow.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Without PS or a script component task, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.

    It looks like the OP decided on PS (not a bad way to go), but I'll post a sample package and files this afternoon to demonstrate the above.

  • dg227 (5/7/2013)


    Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.

    This would divert only the first row.

    "..divert the remaining records from there"

    Please explain how you would implement this.

    Also, I suggested a Script Task, not a Script Component - there is a difference.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks everyone,

    I'm going to use powershell to split the files into separate directories, and loop the loadds through all the files in the directory.

    peter

  • As follow up from yesterday, I've uploaded a zip file containing the dtsx package (detailed below) and a folder (containing an A and B file) that should be saved in your local c:\test folder.

    File A has four columns and 5 data records, file B has 6 columns and 3 data records.

    In a nutshell, the package loops through that SSC_20130507 directory with the two files, parsing them in a single data flow. Using a Delimiter variable (a pipe, in this example) and key values to be found in each file's header, first column ("Col1A" and "Col1B", respectively ... this is shortened to 5 characters from the OP's 10 character request, but demonstrates the functionality) the Data flow determine's each file's type (A or B) based on that key, appends "A" or "B" to the file record, and then conditionally splits the data flow from there. Demonstrating that each file's data can then be parsed and used in its own subsequent set of tasks, a derived column task shreds the column data based on the Delimiter var in to distinct columns, which can then be later used in the data flow (terminating here in Union All transformations, for simplicity).

    Basically, while there can be different approaches to handling this problem, including Script Tasks in the control flow, external scripting (PowerShell), etc., I wanted to demonstrate that yes, you can accomplish this fairly easily within a single data flow.

  • I've just checked your solution - the data flow starts by using a Script Component to check the file type - and every single row of every file will go through that component, despite your original quote:

    Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there.

    Nonetheless, an interesting working example - thanks for posting it.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • True, that's is why I edited my earlier response when you pointed that out, changing component to task (which is what I had originally meant - wrong term).

    The solution does require all file rows to pass through the data flow's script component in order to flag the row with the correct file type and then conditionally splits from there. I don't see that as notable issue because all records are likely going to have to pass through a data flow somewhere in order to be processed (substantially large files could see a performance issue, I suppose, by running each record through that script component ... that'd have have to be tested/evaluated based on business needs, as would the selected solution at the end of the day).

    I'm glad that the sample package and related files worked! (first posting/response with supporting documentation/examples like that)

    Thanks!

    😀

  • in a kind-of-similar thread, someone had a table full of varbinary(max) datatypes full of graphic images, without the filename/extension, so they couldn't figure out if the file was *.jpg, *.gif or what.

    That thread is here if you are curious:

    http://www.sqlservercentral.com/Forums/Topic1178898-392-1.aspx

    when i was helping there, i wrote a snippet that read the first ten bytes in vb.NET from everything in a local folder so i could do the analysis and make some TSQL code to map the first ten bytes to known extensions.

    here's the vb code i used:

    Dim myFilePath As String = String.Empty

    Dim myFileBrowser As New OpenFileDialog

    Try

    With myFileBrowser

    'With statement is used to execute statements using a particular object, here,_

    'setting filters so that Text files and All Files choice appears in the Files of Type box

    'in the dialog

    If .ShowDialog() = DialogResult.OK Then

    'showDialog method makes the dialog box visible at run time

    myFilePath = .FileName

    End If

    End With

    Catch ex As Exception

    MsgBox(ex.Message)

    Finally

    'If Not (sr Is Nothing) Then

    ' sr.Close()

    'End If

    End Try

    'if we have a valid path, we can go forward.

    If myFilePath <> String.Empty Then

    ' Open a file that is to be loaded into a byte array

    Dim oFile As System.IO.FileInfo

    oFile = New System.IO.FileInfo(myFilePath)

    Dim oFileStream As System.IO.FileStream = oFile.OpenRead()

    Dim lBytes As Long = oFileStream.Length

    'the above would read the whole file into the byte array, we want just the 1st 10 bytes for testing.

    lBytes = 10

    If (lBytes > 0) Then

    Dim fileData(lBytes - 1) As Byte

    ' Read the file into a byte array

    oFileStream.Read(fileData, 0, lBytes)

    oFileStream.Close()

    Debug.Print(oFile.Extension & "||" & BitConverter.ToString(fileData))

    End If

    End If

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • That's a neat technique, thanks for posting.

    Quite a common requirement seems to be reading the first n characters, or the first row, of a text file to determine its type/layout before directing to the appropriate dataflow - I reckon you're well on the way to cracking that with this.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply