Need help using SSIS to transform a flat file into SQL table

  • Each month I relieve a text file that contains a list of records that I would like to convert into a SQL table. The challenge I have is that each record has the potential to be related to the previous record and is identified by a sequential line indecator in character positions 14 through 17. In the attached example you will see that record 2 through 8 are all related. What I would like to do is make one record out of the seven lines and concatenate the description field, identified by char position 17 through 82. Once the records have be merged I would like to use the values in all the other columns from the first record. Records 11 & 12 are another good example. I would like to concatenate the description from the two records and use the values from the first record to make one record.

    Can someone discribe how I would accomplish this in SSIS?

    Thank you

  • You have a couple of options.

    On the pure SSIS end - you will need to use a script data flow component. This will allow you to take the input stream from your data flow, examine each row (comparing it to the previous row) and send records to the output stream as needed. If you are familiar with VB.Net, this will actually be pretty easy to do. If you are not familiar with VB.Net, and you have someone around that is familiar with it (or really any .Net programming language) I would recommend you go this route. This will be fairly easy programming and if you are going to do much SSIS development you need to get your feet wet sometime.

    As an alternative, you could import the records into a staging table and use T-SQL to merge together the appropriate rows. Aggregating and concatenating strings like this is pretty common and you will find examples on this site if you do a little searching (if you can't post here again and I can help).

  • Make sure about delimiters.

  • Rob Millerd (12/10/2008)


    Can someone discribe how I would accomplish this in SSIS?

    Heh... no... and I'm not trying to be a smart guy here. The reason why my answer is "no" is because I do this type of stuff in T-SQL so I don't need to write VB scripts, etc.

    Any interest if I spent the time showing how?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Any interest if I spent the time showing how?

    Yes, I would like to see how you would do it in T-SQL. I am currently parsing this file in a PHP script but would like to move away from PHP in this environment.

  • Michael Earl (12/11/2008)


    As an alternative, you could import the records into a staging table and use T-SQL to merge together the appropriate rows. Aggregating and concatenating strings like this is pretty common and you will find examples on this site if you do a little searching (if you can't post here again and I can help).

    A hybrid approach if you want it to all happen as part of the package but are better with T-sql than VB.NET (as I am for one), is that you could pull it into a staging table with SSIS and then have the SSIS package execute the T-SQL script to accomplish the final result.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Rob Millerd (12/13/2008)


    Any interest if I spent the time showing how?

    Yes, I would like to see how you would do it in T-SQL. I am currently parsing this file in a PHP script

    but would like to move away from PHP in this environment.

    Ok... here's how I did it... I saved your Sample.txt in C:\Temp and saved the following

    "code" in the same directory and named it Sample.fmt...

    9.0

    7

    1 SQLCHAR 0 14 "" 1 SomeID ""

    2 SQLCHAR 0 3 "" 2 Sequence ""

    3 SQLCHAR 0 65 "" 3 Description ""

    4 SQLCHAR 0 5 "" 4 SomeNumber ""

    5 SQLCHAR 0 1 "" 5 SomeYN ""

    6 SQLCHAR 0 1 "" 0 Blank ""

    7 SQLCHAR 0 10 "\r\n" 6 Something ""

    For more on BCP format files and how they work (especially for row 6 above),

    see "BCP, Format Files" in Books Online.

    From there, the rest is easy...

    --===== Create a temp table to hold the parsed/imported data

    CREATE TABLE #ETL

    (

    SomeID VARCHAR(14),

    Sequence TINYINT,

    Description VARCHAR(65),

    SomeNumber VARCHAR(5),

    SomeYN CHAR(1),

    Something VARCHAR(11),

    PRIMARY KEY CLUSTERED (SomeID, Sequence)

    )

    --===== Import the data using the BCP Format file to control parsing

    BULK INSERT #ETL

    FROM 'C:\Temp\Sample.txt'

    WITH (

    CODEPAGE = 'RAW',

    DATAFILETYPE = 'CHAR',

    MAXERRORS = 0,

    FORMATFILE = 'C:\Temp\Sample.fmt',

    TABLOCK

    )

    --===== Aggregate the Description in order by Sequence for each SomeID

    SELECT e1.SomeID,

    REPLACE(

    STUFF(

    (

    SELECT ' '+RTRIM(e2.Description)

    FROM #ETL e2

    WHERE e1.SomeID = e2.SomeID

    ORDER BY e2.Sequence

    FOR XML PATH('')

    )

    ,1,1,'') --End of STUFF

    ,'&','&') --End of REPLACE

    AS Description

    FROM #ETL e1

    GROUP BY e1.SomeID

    The output looks like this...

    SomeID Description

    A1871-0 TRUCK ACCESS AND JOB PREPARATION TIME - PER SERVICE CALL

    A1871-00 STRAIGHT TIME - (THIS OPERATION IS PROVIDED TO FACILITATE PROCESSING YOUR CLAIM AND IS TO BE USED WHENEVER STRAIGHT TIME IS REQUESTED ON A CLAIM. THE TIME REQUESTED MUST BE INDICATED ON THE CLAIM AND A DETAILED EXPLANATION MUST BE PROVIDED TO SUPPORT YOUR REQUEST FOR STRAIGHT TIME. IF TIME REQUESTED IS IN EXCESS OF 4.0 HOURS, A WARRANTY AUTHORIZATION IS REQUIRED PER SERVICE BULLETIN # MA00023)

    A1871-1 CAPACITY PLATE - REPLACE

    A1871-10 FLOOR PLATE - REPLACE (INCLUDES FLOOR MAT OPER# 1-9)

    A1871-11 HIP RESTRAINT - REPLACE (INCLUDES TRANSFER OR REPLACE SEAT BELT OPER# 1-16)

    A1871-12 OPERATOR'S COMPARTMENT - R&R

    A1871-13 OVERHEAD GUARD - R & R

    A1871-13A OVERHEAD GUARD - REPLACE

    A1871-13B OVERHEAD GUARD - (FOR EACH LIGHT ADD 0.4 TO TRANSFER OR REPLACE INCLUDING REWIRING)

    If you're using SQL Server 2000, you'll need to replace the SELECT/FOR XML thingy with a concatenation function. Please see the following article for more on that and some of the pitfalls to avoid...

    [font="Arial Black"]Performance Tuning: Concatenation Functions and Some Tuning Myths[/font][/url]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks to all for the replies...I appreciate the assistance, it is exactly what I was looking for.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply