SSIS: Need help with Importing Flat/Text files with multiple "form Feed" sections with header information on each page

  • I have a number of text files that are produced daily that can be loaded to a SQL Db table and reported on. The problem is it requires a lot of manual scrubbing. I'm a bit new to the SSIS world and need to understand how to automate this load. I have had some success in file connection and formatting the input as Ragged Right. BUT the problem lies in the Page Breakes with headers through out each text file.

    Ideally I would like to pick upt the first line of column headings, the fixed with data in the rows and skip every thing else. But how? Best Practice" not a fan of brute force.

    Is there any standard method to pars through a file with data records, column headings, fixed width/Ragged right but has page breaks interspersed through out the file?

    Example:

    --------------

    Wed Jan 06 06:00:01 CST 2010 Page 1

    Daily Job Report

    Job Started Job Finished Loc Jobid Module Chain Job Status

    ------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------

    01-06-2010 06:00:01 Q 104226.00 OMNI_SPEC0011D_HOMEOWNERSHIPCO L1_CBS_SPEC_CHAIN PRED WAIT

    01-06-2010 06:00:01 Q 104225.00 OMNI_SPEC0015D_OVERDRAWNLOCKRE L1_CBS_SPEC_CHAIN PRED WAIT

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104233.00 CBS_KEEP_ALIVE LAUNCHED

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104234.00 AML_KEEP_ALIVE LAUNCHED

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104238.00 VMBT3_KEEPALIVE LAUNCHED

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104236.00 VMBT1_KEEPALIVE LAUNCHED

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104237.00 VMBT2_KEEPALIVE LAUNCHED

    01-06-2010 06:00:01 01-06-2010 06:00:01 Q 104235.00 DAILY_JOB_REPORT_2

    FormFeed symbol appears here

    " " <------------------ need to delete this and the next 5 lines for each section in each file

    Wed Jan 06 06:00:01 CST 2010 Page 2

    Daily Job Report

    Job Started Job Finished Loc Jobid Module Chain Job Status

    ------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------

    01-06-2010 05:01:05 01-06-2010 05:01:05 H 104155.00 OMNI_FINA0018M_FRB-COLLATERAL L1_CBS_FINA_CHAIN Skip!RunCal

    01-06-2010 05:01:05 01-06-2010 05:01:06 H 104152.00 AH_CLEAR_399-0 OSI_M_50CK_PROMO_POST_T1841 INACTIVE

    01-06-2010 05:01:05 01-06-2010 05:01:06 H 104151.00 PS_FILELOADER_9656 OSI_M_50CK_PROMO_POST_T1841 INACTIVE

    01-06-2010 05:01:05 01-06-2010 05:01:08 H 104149.00 OSI_M_50CK_PROMO_POST_T1841 L1_CBS_MARK_CHAIN Skip!RunCal

    01-06-2010 05:01:05 01-06-2010 05:01:05 H 104148.00 OMNI_MARK0001W_50CKGROUPPROMO L1_CBS_MARK_CHAIN Skip!RunCal

    Wed Jan 06 06:00:01 CST 2010 Page 3

    Daily Job Report

    Job Started Job Finished Loc Jobid Module Chain Job Status

    ------------------- ------------------- --- --------- ------------------------------ ------------------------------ ------------

    01-06-2010 05:01:02 01-06-2010 05:01:07 H 104123.00 L1_CBS_BOPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

    01-06-2010 05:01:02 01-06-2010 05:01:07 H 104124.00 L1_CBS_COMM_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

    01-06-2010 05:01:02 01-06-2010 05:10:07 H 104125.00 L1_CBS_DEPS_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

    01-06-2010 05:01:02 01-06-2010 05:01:14 H 104126.00 L1_CBS_INFO_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

    01-06-2010 05:01:02 01-06-2010 05:02:29 H 104127.00 L1_CBS_RISK_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

    01-06-2010 05:01:02 01-06-2010 05:01:07 H 104129.00 L1_CBS_TELE_CHAIN L0_CBS_MASTER_CHAIN_MORNING FINISHED

  • You'll need to use a Script Task component to accomplish your task. A great example and explanation can be found at:

    Good luck.

  • Hey jeydavius

    Thanks for the lead. It got me pointed in the right direction and I was able to figure out the rest!

    So in the spirit of giving back and sharing the solution so others can benefit heres what I ended up doing:

    1. set up a flat file connector with a ragged right fixed width.

    2. eliminated the leading headers to the first line of data.

    3. added a script component that examines the first character on the first predefined column for a CHR(12) page break symbol

    4. set a looping 5 line count to skip that header section

    5. Wrote the lines of good data to the output column buffer.

    6. the key here is for the output columns in the script component you have to set the "synchronousInput" property to NONE. This allows you to to map the columns from the input source to the output columns programatically AFTER you've weeded out the unwanted lines (repeating header sections within the file).

    7 After that life was bliss. And it too FAR less lines of code than I originally conceived.

    Thanks for the direction! The best problem solving comes not from knowing all the answers. But knowing where to look for the answer!

    Code below:

    ------------------------------------------

    ' Microsoft SQL Server Integration Services user script component

    ' This is your new script component in Microsoft Visual Basic .NET

    ' ScriptMain is the entrypoint class for script components

    Imports System

    Imports System.Data

    Imports System.IO

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub inputFileSource_ProcessInputRow(ByVal Row As inputFileSourceBuffer)

    Dim WantedValue As Integer

    Dim i As Integer

    Dim x As Integer

    WantedValue = Int(Asc(Row.JobStarted.Substring(1, 1)))

    If WantedValue = 12 Then

    For i = 1 To 5

    Row.NextRow()

    Next

    Else

    With Me.outputDelPgBrksHdrsBuffer

    .AddRow()

    .START = CDate(Row.JobStarted)

    .STOP = CDate(Row.JobFinished)

    .LOC = RTrim(Row.Loc)

    .JOBID = CInt(Row.JobID)

    .MODULE = RTrim(Row.Module)

    .CHAIN = RTrim(Row.Chain)

    .FILENAME = RTrim(Row.AWImportFileName)

    .STATUS = Row.JobStatus

    End With

    Row.NextRow()

    End If

    End Sub

    End Class

  • To have figured all of that out based on the hint is impressive, as is posting back your solution so thoroughly. Good stuff.

    Now I shall go and read about the "synchronousInput" property ...

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply