November 8, 2017 at 3:58 pm
Hi
I regularly import a set of Windows text files into SQL. These files are a fixed length and the data is column delimited. In general I can import them using a series of SSIS dataflow text file to SQL stages
Unfortunately one of the files can be two different set lengths, 331 or 351 columns. All the lines in the file will be the same length. On the short length file the SSIS import screws the import by bringing the start of the next line into the end of the first.
I wrote a VB script task that reads each line, checks the line length and pads short lines to the full length, however I’ve obviously made some error as it only writes every other line.
I was wondering if anyone out there can help me?
Thanks
Steve
VB Code (extracted from SSIS Edit Script)
Dim FILE_NAME As String = "C:\For adding to Collection\NAT00080.txt"
Dim TextLine As String
Dim putLine As String
Try
Dim sr As StreamReader = New StreamReader(FILE_NAME)
Dim Writer As System.IO.StreamWriter
Writer = IO.File.CreateText("C:\For adding to Collection\NAT00080.txt")
Do While sr.Peek() >= 0
TextLine = sr.ReadLine()
' Pinched and inserted pad string code
If Len(TextLine) = 331 Then
' Have to check to make sure that the len of the string is less than 351, otherwise you will get an error.
putLine = TextLine & Space(20)
Else
putLine = TextLine
End If
Writer.WriteLine(putLine)
Loop
Writer.Close()
sr.Close()
Catch e As Exception
Console.WriteLine("The process failed: {0}", e.ToString())
End Try
November 8, 2017 at 4:10 pm
Try changing the format of the input file from fixed width to ragged right, you shouldn't need a pre process step.
November 8, 2017 at 4:18 pm
Thanks for the quick responseZZartin
It's already set as raged right. I've tried most variations of format and encoding! It doesn't seem to make a difference.
Steve
November 8, 2017 at 8:36 pm
steve.alston - Wednesday, November 8, 2017 3:57 PMHi
I regularly import a set of Windows text files into SQL. These files are a fixed length and the data is column delimited. In general I can import them using a series of SSIS dataflow text file to SQL stages
Unfortunately one of the files can be two different set lengths, 331 or 351 columns. All the lines in the file will be the same length. On the short length file the SSIS import screws the import by bringing the start of the next line into the end of the first.
I wrote a VB script task that reads each line, checks the line length and pads short lines to the full length, however I’ve obviously made some error as it only writes every other line.
I was wondering if anyone out there can help me?
Thanks
Steve
VB Code (extracted from SSIS Edit Script)
Dim FILE_NAME As String = "C:\For adding to Collection\NAT00080.txt"
Dim TextLine As String
Dim putLine As String
Try
Dim sr As StreamReader = New StreamReader(FILE_NAME)
Dim Writer As System.IO.StreamWriter
Writer = IO.File.CreateText("C:\For adding to Collection\NAT00080.txt")Do While sr.Peek() >= 0
TextLine = sr.ReadLine()
' Pinched and inserted pad string code
If Len(TextLine) = 331 Then
' Have to check to make sure that the len of the string is less than 351, otherwise you will get an error.
putLine = TextLine & Space(20)
Else
putLine = TextLine
End If
Writer.WriteLine(putLine)
Loop
Writer.Close()
sr.Close()
Catch e As Exception
Console.WriteLine("The process failed: {0}", e.ToString())
End Try
Can you post the record layout for the two files?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2017 at 6:10 am
steve.alston - Wednesday, November 8, 2017 3:57 PMHi
I regularly import a set of Windows text files into SQL. These files are a fixed length and the data is column delimited. In general I can import them using a series of SSIS dataflow text file to SQL stages
Unfortunately one of the files can be two different set lengths, 331 or 351 columns. All the lines in the file will be the same length. On the short length file the SSIS import screws the import by bringing the start of the next line into the end of the first.
I wrote a VB script task that reads each line, checks the line length and pads short lines to the full length, however I’ve obviously made some error as it only writes every other line.
I was wondering if anyone out there can help me?
Thanks
Steve
There is a possible alternative approach.
Build a task in your SSIS Control Flow (inside a FOREACH container, assuming you're using that to pick up the various files) which interrogates the 'current' file to determine the number of columns and set a package variable accordingly ('Cols331' or 'Cols351' or similar).
Add another dataflow task – so now you have one for 331 cols and one for 351 cols.
Change the precedence constraints going into the dataflow tasks to be 'Expression and Constraint', with the expression being along the lines of
varname == "Cols331" or
varname == "Cols351"
Thus you are able to control the execution path depending on the type of file being processed.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 9, 2017 at 3:42 pm
Hi Guys thanks for the support and suggestions.
The file spec is:
Fields – Client (NAT00080) file | Position | Length | Type |
Client identifier | 1 | 10 | A |
Name for encryption | 11 | 60 | A |
Highest school level completed identifier | 71 | 2 | A |
Year highest school level completed | 73 | 4 | A |
Sex | 77 | 1 | A |
Date of birth | 78 | 8 | A |
Postcode | 86 | 4 | A |
Indigenous status identifier | 90 | 1 | A |
Language identifier | 91 | 4 | A |
Labour force status identifier | 95 | 2 | A |
Country identifier | 97 | 4 | A |
Disability flag | 101 | 1 | A |
Prior educational achievement flag | 102 | 1 | A |
At school flag | 103 | 1 | A |
Proficiency in spoken English identifier | 104 | 1 | A |
Address location – suburb, locality or town | 105 | 50 | A |
Unique student identifier | 155 | 10 | A |
State identifier | 165 | 2 | A |
Address building/property name | 167 | 50 | A |
Address flat/unit details | 217 | 30 | A |
Address street number | 247 | 15 | A |
Address street name | 262 | 70 | A |
Record length for national data collection for training organisations: | 331 | ||
Statistical area level 1 identifier | 332 | 11 | A |
Statistical area level 2 identifier | 343 | 9 | A |
Record length for national data collection for state and territory training authorities: | 351 | ||
Carriage return/line feed (ASCII 13/10): | 2 |
It's a simple text file and if data is missing spaces fill the appropriate columns.
Most files are the full-length (351) and the last two fields are always empty so adding 20 spaces to the end of the short files for import is fine. Unfortunately, I can't get the suppliers to send the full-length files which would be the nice solution!
I'll give Phil Parkin's idea a go too. I don't use a foreach loop (yet) but I can see that might work in my flow.
The full progess flow is unzipping the 9 text files, import them into SQL, fix up some of the data, Identify and count which of the fields are missing, identify the clients where data is missing so it can be rectified, append this data set to those previously imported (having first made a copy of those tables), export and zip the current modified set and then export and zip the "total so far" fileset into the correct directories..
Eventually I'll get around to reading in all the source filesets in a foreach loop, but the process has gone from a manual 45 minute each set process (if all goes well) to around 45 seconds each and with ony around a hundred to do every 3 months I'm pleased with what I've almost done so far.
My current work around for this file is to pass the file through Access (which is too dumb to complain about short line length), breakpoint, and check it's worked ok. If it hasn't, I use the Access internal import which works every time and continue.
I't's just annoying that this step stops the process midway!
Sorry for the rant.
Steve
November 9, 2017 at 4:23 pm
Another alternative is to import the files into a stage table with 1 column. Then SUBSTRING the column to get all the columns above.
November 9, 2017 at 5:51 pm
Thanks to everyone who helped me.
brad.mason 's answer works great. I don't know why I ddn't think of it earlier. SQL is happy to ignore the SUBSTRING selects for the last two fields when the files are short. I assumed that since importing caused issues it wouldn't handle processing data that wasn't there.
With files this size any extra processing time is almost zero.
Sometimes you can't see the wood for the trees!
Still along the way I did learn something about VB. Even if it didn't work properly!
Thanks again guys
Steve
November 9, 2017 at 6:24 pm
steve.alston - Thursday, November 9, 2017 5:51 PMThanks to everyone who helped me.brad.mason 's answer works great. I don't know why I ddn't think of it earlier. SQL is happy to ignore the SUBSTRING selects for the last two fields when the files are short. I assumed that since importing caused issues it wouldn't handle processing data that wasn't there.
With files this size any extra processing time is almost zero.
Sometimes you can't see the wood for the trees!
Still along the way I did learn something about VB. Even if it didn't work properly!Thanks again guys
Steve
The substring trick works fine for this but if you ever need that extra burst of speed, it would be worthwhile to make 2 BCP format files (1 for each record layout) to do the imports. The system could be setup to read just one row from a given file, determine its length, and then decide from that which BCP format file to use to import the full file using BULK INSERT. If the target table where properly setup with the correct datatypes, it would even do row/column validation auto-magically instead of having to do something during or after the substrings.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply