May 6, 2013 at 3:29 pm
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.
So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.
SSIS 2012.
May 6, 2013 at 5:47 pm
Powershell would work.
You could read first line until crlf - that would give you the format of your file, then use some string processing in Powershell to build some type of "logic" to assess the type of "data file format"
--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully 😀
May 6, 2013 at 11:10 pm
pmcquirk 96901 (5/6/2013)
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.
SSIS 2012.
You could also use a script task, as you suggest. Here is a link that should get you started, should you choose to go down that path. Within the script task, you would also, of course, set the package variable which you would use in subsequent precedence constraints to select the relevant data flow.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 7, 2013 at 7:40 am
Without PS or a script component task, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.
It looks like the OP decided on PS (not a bad way to go), but I'll post a sample package and files this afternoon to demonstrate the above.
May 7, 2013 at 7:53 am
dg227 (5/7/2013)
Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.
This would divert only the first row.
"..divert the remaining records from there"
Please explain how you would implement this.
Also, I suggested a Script Task, not a Script Component - there is a difference.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 7, 2013 at 10:42 am
Thanks everyone,
I'm going to use powershell to split the files into separate directories, and loop the loadds through all the files in the directory.
peter
May 8, 2013 at 8:02 am
As follow up from yesterday, I've uploaded a zip file containing the dtsx package (detailed below) and a folder (containing an A and B file) that should be saved in your local c:\test folder.
File A has four columns and 5 data records, file B has 6 columns and 3 data records.
In a nutshell, the package loops through that SSC_20130507 directory with the two files, parsing them in a single data flow. Using a Delimiter variable (a pipe, in this example) and key values to be found in each file's header, first column ("Col1A" and "Col1B", respectively ... this is shortened to 5 characters from the OP's 10 character request, but demonstrates the functionality) the Data flow determine's each file's type (A or B) based on that key, appends "A" or "B" to the file record, and then conditionally splits the data flow from there. Demonstrating that each file's data can then be parsed and used in its own subsequent set of tasks, a derived column task shreds the column data based on the Delimiter var in to distinct columns, which can then be later used in the data flow (terminating here in Union All transformations, for simplicity).
Basically, while there can be different approaches to handling this problem, including Script Tasks in the control flow, external scripting (PowerShell), etc., I wanted to demonstrate that yes, you can accomplish this fairly easily within a single data flow.
May 8, 2013 at 8:32 am
I've just checked your solution - the data flow starts by using a Script Component to check the file type - and every single row of every file will go through that component, despite your original quote:
Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there.
Nonetheless, an interesting working example - thanks for posting it.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 8, 2013 at 10:17 am
True, that's is why I edited my earlier response when you pointed that out, changing component to task (which is what I had originally meant - wrong term).
The solution does require all file rows to pass through the data flow's script component in order to flag the row with the correct file type and then conditionally splits from there. I don't see that as notable issue because all records are likely going to have to pass through a data flow somewhere in order to be processed (substantially large files could see a performance issue, I suppose, by running each record through that script component ... that'd have have to be tested/evaluated based on business needs, as would the selected solution at the end of the day).
I'm glad that the sample package and related files worked! (first posting/response with supporting documentation/examples like that)
Thanks!
😀
May 8, 2013 at 10:30 am
in a kind-of-similar thread, someone had a table full of varbinary(max) datatypes full of graphic images, without the filename/extension, so they couldn't figure out if the file was *.jpg, *.gif or what.
That thread is here if you are curious:
http://www.sqlservercentral.com/Forums/Topic1178898-392-1.aspx
when i was helping there, i wrote a snippet that read the first ten bytes in vb.NET from everything in a local folder so i could do the analysis and make some TSQL code to map the first ten bytes to known extensions.
here's the vb code i used:
Dim myFilePath As String = String.Empty
Dim myFileBrowser As New OpenFileDialog
Try
With myFileBrowser
'With statement is used to execute statements using a particular object, here,_
'setting filters so that Text files and All Files choice appears in the Files of Type box
'in the dialog
If .ShowDialog() = DialogResult.OK Then
'showDialog method makes the dialog box visible at run time
myFilePath = .FileName
End If
End With
Catch ex As Exception
MsgBox(ex.Message)
Finally
'If Not (sr Is Nothing) Then
' sr.Close()
'End If
End Try
'if we have a valid path, we can go forward.
If myFilePath <> String.Empty Then
' Open a file that is to be loaded into a byte array
Dim oFile As System.IO.FileInfo
oFile = New System.IO.FileInfo(myFilePath)
Dim oFileStream As System.IO.FileStream = oFile.OpenRead()
Dim lBytes As Long = oFileStream.Length
'the above would read the whole file into the byte array, we want just the 1st 10 bytes for testing.
lBytes = 10
If (lBytes > 0) Then
Dim fileData(lBytes - 1) As Byte
' Read the file into a byte array
oFileStream.Read(fileData, 0, lBytes)
oFileStream.Close()
Debug.Print(oFile.Extension & "||" & BitConverter.ToString(fileData))
End If
End If
Lowell
May 9, 2013 at 1:14 am
That's a neat technique, thanks for posting.
Quite a common requirement seems to be reading the first n characters, or the first row, of a text file to determine its type/layout before directing to the appropriate dataflow - I reckon you're well on the way to cracking that with this.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply