Importing vertical data using the flat file task?

  • TonyP (3/19/2008)


    Well, it is not exactly inserting each application file into many tables. It is more like taking information out of the application file and sending it to a variety of different tables.

    Same thing said just a little differently. You won't have a problem either way. The key is like someone (Matt?) suggested earlier... get the data into a "staging table" where we can work on/distribute the data to a variety of different tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • All right- if you're hard-core about wanting to use SSIS:

    1. You're going to need to read up on file-system tasks and foreach tasks (so that you can "loop" on each file it finds.) once you do that - then you can focus on the data task that imports the file.

    2. The data flow I come up with looks something like this:

    - 1 flat file input. Make it delimited, with a delimiter you're NOT using (say - pipe). Creating something that imports each row as a single column

    - split the app number off (using a conditional split) from the other rows

    - for the app number branch, use SUBSTRING and FIND to pull out the app number and the code after that (whatever that is)

    - for the everything else branch: use the string functions to split each row into 5 fields: table, rowID, fieldname, value, code

    Example

    Veh1_Make, Chevy, 0 --> VEH....1....Make....chevy....0

    Veh2_Make, Nissan, 0 --> VEH....2....Make....nissan....0

    Veh1_Year, 2004, 0 --> VEH....1....Make....2004....0

    Veh2_Year, 2003, 0 --> VEH....2....Make....2003....0

    - merge task to put the AppNuM and it code back in wherever it needs to be (appNum everywhere, code - I don't know).

    - add a PIVOT task. Pivot the fdervice columns based on tables, using the rowID's to match them up, grouping by appnum.

    - output to however many tables you end up with.

    That's going to be a BIG-looking SSIS.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • That's going to be a BIG-looking SSIS

    I figured that much out... I remember how big something like this got in DTS... that's why I was taking the T-SQL road... at least it would be fast 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/19/2008)


    That's going to be a BIG-looking SSIS

    I figured that much out... I remember how big something like this got in DTS... that's why I was taking the T-SQL road... at least it would be fast 😛

    Well - don't sell SSIS short yet. It makes for an impressive John Madden-"like" description of a process, but it does have some fairly serious giddy up and go to it. Just because there's a lot of steps, all of which are one or more boxes on the process display, doesn't mean it can't fly through them.

    I asked andy L when he first posted his article about it, and from what I can tell - he's right. It does seem to move through quite a bit faster than ol' DTS. How MUCH faster is a good question - not sure I have a solid handle on how to really "time" the sucker down to the ms. But I've imported 2.5 million rows (denormalized) and renormalized them within 2 minutes. A similar DTS took somewhere around twice that. Not too shabby.

    With something like this process - it's going to be hard to tell, since the true perf test is how fast can it process through a directory with 100,000 of these files in it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I'll review these suggestions and get back with you guys later.

  • Jeff is very right of course.

    Some comments that might help to wrap your head around SSIS and get going:

    - From a design perspective, expect to create lot's of steps and pieces. Don't try to stuff all operations into one little step.

    - consider creating a "dispatcher step" to route the different records to different edit checks. This'll help modularize your logic.

    - Stuff the results into memory variables (you may end up creating a lot), then when the "key changes", trggering a write or transfer to a recordset. Then, you'll be able to logically output to whereever you want.

    Insurance applications typically contain lots of data (been there), so creating all of the individual pieces may seem tedious. However, once done, I expect you'll find it flies, and will be realtively easy to maintain. (And you'll be able to repurpose it easily, if you get it right.)

    Good luck.

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply