April 26, 2010 at 10:41 am
I have a number of text files that are produced daily that can be loaded to a SQL Db table and reported on. The problem is it requires a lot of manual scrubbing. I'm a bit new to the SSIS world and need to understand how to automate this load. I have had some success in file connection and formatting the input as Ragged Right. BUT the problem lies in the Page Breakes with headers through out each text file.
Ideally I would like to pick upt the first line of column headings, the fixed with data in the rows and skip every thing else. But how? Best Practice" not a fan of brute force.
Is there any standard method to pars through a file with data records, column headings, fixed width/Ragged right but has page breaks interspersed through out the file?
Example:
--------------
Wed Jan 06 06:00:01 CST 2010 Page 1
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 06:00:01 Q 104226.00 OMNI_SPEC0011D_HOMEOWNERSHIPCO L1_CBS_SPEC_CHAIN PRED WAIT
01-06-2010 06:00:01 Q 104225.00 OMNI_SPEC0015D_OVERDRAWNLOCKRE L1_CBS_SPEC_CHAIN PRED WAIT
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104233.00 CBS_KEEP_ALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104234.00 AML_KEEP_ALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104238.00 VMBT3_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104236.00 VMBT1_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104237.00 VMBT2_KEEPALIVE LAUNCHED
01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104235.00 DAILY_JOB_REPORT_2
FormFeed symbol appears here
"" <------------------ need to delete this and the next 5 lines for each section in each file
Wed Jan 06 06:00:01 CST 2010 Page 2
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 05:01:05 01-06-2010 05:01:05 H 104155.00 OMNI_FINA0018M_FRB-COLLATERAL L1_CBS_FINA_CHAIN Skip!RunCal
01-06-2010 05:01:05 01-06-2010 05:01:06 H 104152.00 AH_CLEAR_399-0 OSI_M_50CK_PROMO_POST_T1841 INACTIVE
01-06-2010 05:01:05 01-06-2010 05:01:06 H 104151.00 PS_FILELOADER_9656 OSI_M_50CK_PROMO_POST_T1841 INACTIVE
01-06-2010 05:01:05 01-06-2010 05:01:08 H 104149.00 OSI_M_50CK_PROMO_POST_T1841 L1_CBS_MARK_CHAIN Skip!RunCal
01-06-2010 05:01:05 01-06-2010 05:01:05 H 104148.00 OMNI_MARK0001W_50CKGROUPPROMO L1_CBS_MARK_CHAIN Skip!RunCal
Wed Jan 06 06:00:01 CST 2010 Page 3
Daily Job Report
Job Started Job Finished Loc Jobid Module Chain Job Status
------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104123.00 L1_CBS_BOPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104124.00 L1_CBS_COMM_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:10:07 H 104125.00 L1_CBS_DEPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:14 H 104126.00 L1_CBS_INFO_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:02:29 H 104127.00 L1_CBS_RISK_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
01-06-2010 05:01:02 01-06-2010 05:01:07 H 104129.00 L1_CBS_TELE_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED
April 26, 2010 at 4:57 pm
April 27, 2010 at 8:54 am
Hey jeydavius
Thanks for the lead. It got me pointed in the right direction and I was able to figure out the rest!
So in the spirit of giving back and sharing the solution so others can benefit heres what I ended up doing:
1. set up a flat file connector with a ragged right fixed width.
2. eliminated the leading headers to the first line of data.
3. added a script component that examines the first character on the first predefined column for a CHR(12) page break symbol
4. set a looping 5 line count to skip that header section
5. Wrote the lines of good data to the output column buffer.
6. the key here is for the output columns in the script component you have to set the "synchronousInput" property to NONE. This allows you to to map the columns from the input source to the output columns programatically AFTER you've weeded out the unwanted lines (repeating header sections within the file).
7 After that life was bliss. And it too FAR less lines of code than I originally conceived.
Thanks for the direction! The best problem solving comes not from knowing all the answers. But knowing where to look for the answer!
Code below:
------------------------------------------
' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components
Imports System
Imports System.Data
Imports System.IO
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub inputFileSource_ProcessInputRow(ByVal Row As inputFileSourceBuffer)
Dim WantedValue As Integer
Dim i As Integer
Dim x As Integer
WantedValue = Int(Asc(Row.JobStarted.Substring(1, 1)))
If WantedValue = 12 Then
For i = 1 To 5
Row.NextRow()
Next
Else
With Me.outputDelPgBrksHdrsBuffer
.AddRow()
.START = CDate(Row.JobStarted)
.STOP = CDate(Row.JobFinished)
.LOC = RTrim(Row.Loc)
.JOBID = CInt(Row.JobID)
.MODULE = RTrim(Row.Module)
.CHAIN = RTrim(Row.Chain)
.FILENAME = RTrim(Row.AWImportFileName)
.STATUS = Row.JobStatus
End With
Row.NextRow()
End If
End Sub
End Class
April 27, 2010 at 9:09 am
To have figured all of that out based on the hint is impressive, as is posting back your solution so thoroughly. Good stuff.
Now I shall go and read about the "synchronousInput" property ...
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply