April 23, 2010 at 10:00 am
I have a number of text files that are produced daily that can be loaded to a SQL Db table and reported on. The problem is it requires a lot of manual scrubbing. I'm a bit new to the SSIS world and need to understand how to automate this load. I have had some success in file connection and formatting the input as Ragged Right. BUT the problem lies in the Page Breakes with headers through out each text file.
Is there any standard method to pars through a file with data records, column headings, fixed width/Ragged right but has page breaks interspersed through out the file?
Example:
--------------
Wed Jan 06 06:00:01 CST 2010 Page 1
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 06:00:01 Q 104226.00 OMNI_SPEC0011D_HOMEOWNERSHIPCO L1_CBS_SPEC_CHAIN PRED WAIT
01-06-2010 06:00:01 Q 104225.00 OMNI_SPEC0015D_OVERDRAWNLOCKRE L1_CBS_SPEC_CHAIN PRED WAIT
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104233.00 CBS_KEEP_ALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104234.00 AML_KEEP_ALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104238.00 VMBT3_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104236.00 VMBT1_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104237.00 VMBT2_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104235.00 DAILY_JOB_REPORT_2
Wed Jan 06 06:00:01 CST 2010 Page 2
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 05:01:05 01-06-2010 05:01:05 H 104155.00 OMNI_FINA0018M_FRB-COLLATERAL L1_CBS_FINA_CHAIN Skip!RunCal
01-06-2010 05:01:05 01-06-2010 05:01:06 H 104152.00 AH_CLEAR_399-0 OSI_M_50CK_PROMO_POST_T1841 INACTIVE
01-06-2010 05:01:05 01-06-2010 05:01:06 H 104151.00 PS_FILELOADER_9656 OSI_M_50CK_PROMO_POST_T1841 INACTIVE
01-06-2010 05:01:05 01-06-2010 05:01:08 H 104149.00 OSI_M_50CK_PROMO_POST_T1841 L1_CBS_MARK_CHAIN Skip!RunCal
01-06-2010 05:01:05 01-06-2010 05:01:05 H 104148.00 OMNI_MARK0001W_50CKGROUPPROMO L1_CBS_MARK_CHAIN Skip!RunCal
Wed Jan 06 06:00:01 CST 2010 Page 3
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104123.00 L1_CBS_BOPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104124.00 L1_CBS_COMM_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:10:07 H 104125.00 L1_CBS_DEPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:14 H 104126.00 L1_CBS_INFO_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:02:29 H 104127.00 L1_CBS_RISK_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104129.00 L1_CBS_TELE_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
April 26, 2010 at 12:40 pm
I don't know if there is a best practice on this but there are two areas you could look into.
You could use a script task and strip out the column header information from the file before passing it through your data flow. I am not strong in scripting, and I would be hard pressed to come up with a working sample quickly.
That being said I would try to attack this with a conditional split. I think all the tools you need are in there to send only rows with real data downstream.
Good luck.
April 27, 2010 at 9:10 am
hey dbowlin,
Thanks muchos for your reply. I got a good lead from JAYDAVISUS that gave me enough info to create the solution. So in the spirit of sharing back here's how I created the solution:
This actually skips all the repeating header sections through out the file and writes the rows of data to the output columns.
1. set up a flat file connector with a ragged right fixed width.
2. eliminated the leading headers to the first line of data.
3. added a script component that examines the first character on the first predefined column for a CHR(12) page break symbol
4. set a looping 5 line count to skip that header section
5. Wrote the lines of good data to the output column buffer.
6. the key here is for the output columns in the script component you have to set the "synchronousInput" property to NONE. This allows you to to map the columns from the input source to the output columns programatically AFTER you've weeded out the unwanted lines (repeating header sections within the file).
7 After that life was bliss. And it too FAR less lines of code than I originally conceived.
Thanks for the direction! The best problem solving comes not from knowing all the answers. But knowing where to look for the answer!
Code below:
------------------------------------------
' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components
Imports System
Imports System.Data
Imports System.IO
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub inputFileSource_ProcessInputRow(ByVal Row As inputFileSourceBuffer)
Dim WantedValue As Integer
Dim i As Integer
Dim x As Integer
WantedValue = Int(Asc(Row.JobStarted.Substring(1, 1)))
If WantedValue = 12 Then
For i = 1 To 5
Row.NextRow()
Next
Else
With Me.outputDelPgBrksHdrsBuffer
.AddRow()
.START = CDate(Row.JobStarted)
.STOP = CDate(Row.JobFinished)
.LOC = RTrim(Row.Loc)
.JOBID = CInt(Row.JobID)
.MODULE = RTrim(Row.Module)
.CHAIN = RTrim(Row.Chain)
.FILENAME = RTrim(Row.AWImportFileName)
.STATUS = Row.JobStatus
End With
Row.NextRow()
End If
End Sub
End Class
April 27, 2010 at 12:17 pm
I am glad you were able to work it out. Congratulations. I also spend a fair amount of time over on a SQL Server BI site, and they just had an article about this very subject. You may want to give it a look to see what other tidbits you can pick up.
http://www.bidn.com/articles/integration-services/112/handling-headers-in-flat-files-with-ssis
April 27, 2010 at 1:26 pm
Thanks.
I'll check that out.
Although I have a change to the above code. Same steps but the VB portion changed. I started debugging and ran into so little data conversion issues. this piece of code ran to completion.
' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components
Imports System
Imports System.Data
Imports System.IO
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub inputFileSource_ProcessInputRow(ByVal Row As inputFileSourceBuffer)
Dim WantedValue As Integer
Dim i As Integer
Dim x As Integer
Dim JobStat As Boolean
Dim RowCount As Integer
If Row.JobStatus.Contains("Page") = True Then
For i = 1 To 7
Row.NextRow()
Next
ElseIf Row.JobStatus.Contains("--------") = True Then
Row.NextRow()
ElseIf Row.JobStarted.Contains("Job Started") = True Then
Row.NextRow()
Else
With Me.outputDelPgBrksHdrsBuffer
.AddRow()
'handle nulls for the start date
If Trim(Row.JobStarted) = "" Then
.START = CDate("12/30/1899 12:00:00 AM")
Else
.START = CDate(Trim(Row.JobStarted))
End If
'handle nulls for the Finish date
If Trim(Row.JobFinished) = "" Then
.STOP = CDate("12/30/1899 12:00:00 AM")
Else
.STOP = CDate(Trim(Row.JobFinished))
End If
.LOC = Trim(Row.Loc)
'handle nulls for the JobID
If Trim(Row.JobID) = "" Then
.JOBID = CInt("-9")
Else
.JOBID = CInt(Trim(Row.JobID))
End If
.MODULE = RTrim(Row.Module)
.CHAIN = RTrim(Row.Chain)
.FILENAME = RTrim(Row.AWImportFileName)
.STATUS = RTrim(Row.JobStatus)
End With
End If
End Sub
End Class
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply