Loading data from Flat File to a database

  • I know it sounds easy, but believe me, it is something really crazy! I have this case where I have to bring data from 4 flat files to a database (where we have 10 tables).

    What I have to do is:

    1. Map the data to the appropriate table in the already created table ( 4 files, 10 tables)

    2. Know what is added, updated or deleted everyday(No specific column given, anything can change)

    3. If there is any error, I have to raise to that particular error (must show them the whole row) and bring that to another table.

    4. Must not proceed until they send back the fixed error

    Anybody have any idea to help me out on this? Please, my dead line is very near!:sick:

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • SQL King (2/5/2008)


    I know it sounds easy, but believe me, it is something really crazy! I have this case where I have to bring data from 4 flat files to a database (where we have 10 tables).

    What I have to do is:

    1. Map the data to the appropriate table in the already created table ( 4 files, 10 tables)

    2. Know what is added, updated or deleted everyday(No specific column given, anything can change)

    3. If there is any error, I have to raise to that particular error (must show them the whole row) and bring that to another table.

    4. Must not proceed until they send back the fixed error

    Anybody have any idea to help me out on this? Please, my dead line is very near!:sick:

    Part of how to best do this will depend on your freedom to modify the tables you are working with, or to add aditional tables for auditing/tracking. Adding a column to capture a timestamp when a record is entered into a table is one way of doing this, another would be to have a separate stats or audit table to capture pertinent information about the load on either a row or job basis or both. Error tables/exception queues can also be built into which you can direct rows which error out from your data flow. Coming from 4 different files, you could set up multiple error pathways which then union together and write out to a common error file along with an indicator of which file originated the error.

    Depending on how errors are supposed to be remedied, you could also use the table which stores the error rows to track when the error was passed off to whoever is suposed to take care of it.

    These are all pretty general suggestions, hope some of them are useful.

  • Assuming you easnt to use "native" SSIS functions, you're going to want to look at the lookup tasks, conditional split and data conversion tasks, and how to use the "errors" there to implement your workflow. The "Lookup failure" = new records, and the conditional split should allow you to figure out what needs updating.

    It's a bit heady - but playing with it a little while will make it a lot more clear.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the support. But how is the comparison supposed to work? Meaning, there is no specific column given to me to compare the two day's data (yesterday, today...and so on) How does SSIS do this. I know it is pretty simple if we have a key column and the changes take place depending on that, but in my case it is very different.

    Also, how to parse error. First of all, how to capture that the certain field is not right? Then send them to a table?

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • SQL King (2/5/2008)


    Thanks for the support. But how is the comparison supposed to work? Meaning, there is no specific column given to me to compare the two day's data (yesterday, today...and so on) How does SSIS do this. I know it is pretty simple if we have a key column and the changes take place depending on that, but in my case it is very different.

    Also, how to parse error. First of all, how to capture that the certain field is not right? Then send them to a table?

    One possibility for quick change detection is to download the free Checksum component from Konesans at http://www.konesans.com/checksum.aspx. You can build a checksum from the entire row and check it against previously stored values. It's not theoretically foolproof, and over a wide enough range of values you could possibly get a hash collision, but I've had great success with it.

    For checking field validity, you'll need to encode your business rules somehow. Like Matt suggested, you can do validations via lookup transformations. Create a table of valid values for your various fields, as well as their field type or name, and query against this table for the lookup target set. Route the lookup failures into the error flow rather than allowing the failures to fail the component or package.

  • The big question is - how would you correlate the rows without SSIS? How would you relate data from today to data from yesterday, row-by row? Something has got to have commonality otherwise there's no update possible, and you would have to empty the table out and start over, no?

    ------------------------------------

    Assuming you can get past that conceptual barrier

    ------------------------------------

    See if this pushes you closer. If you set up the lookup task based on some sort of PK (that would also exist in the text file), you can then include the field already in the DB table (which I rename just so I know what it is, putting current in front of the column name). Once you then get to the conditional split, you can then set up comparisons like this:

    current_col1<>col1 or current_col2<>col2

    If that fires, then that means something was updated.

    Otherwise - try giving us some concrete examples, because without it - it's hard to help you.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • WOW guys! Thanks for the great help. I really appreaciate the help. I know that without solid column, it is hard to compare datas, but the way the data is coming in is crazy. They change every single day and the changes are big. I have no idea why they are doing this. My lead just suggested to use a third party tool. I have no idea yet, when and what they are about to use. I will continue as soon as I have some news on hand.

    But as far as the error goes, let me explain a little bit more for better understandings.

    1. I have Flat files coming in everyday. They look fine. But there are possiblities that the data type can be wrong one day! (We do get the same structure everyday) How do I do this logic and please guys, suggest me what else can there be wrong?!#. I am supposed to figure out what else can go wrong before it goes to the table for further processes.

    2. If there is an error, it should take the whole row(even when only one column is wrong), and then send it to another table/file/folder. These errors will will then sent back to the source asking them to fix it and send back the good data so that the earlier process continues.(hopefully nobody got confused)

    Until now, i just want to figure this out, I will be a little bit better if I know how to do this first!

    And Matt & Smith.......Thanks a million for earlier help.

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • Can someone please help me on this one still?

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • Is there any way to dummy up a little sample data? It's REALLY hard to help you with no concrete basis to start from: the conversation is at such an abstract level, it's not going to be useful.

    What would 1-2 records look like in the incoming file? what do the records they match up to look like?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Okay, let me put it this way, probably I was making everyone confused as I myself was one!

    Day 1: I will be getting in source data from flat file to my DB (easy)

    Day 2: I have to push Day1's DB to another point(by doing a copy of a DB and call it Previous Day DB from then on) Now, some of the data may OR may NOT change on day 2, how do I compare Day1's data with Day 2 data?

    Note: One small change will affect a lot of table. Meaning if one column of the parent table changes, it has to change ALL the child table that has link to that parent table.

    In my case, I have many of these parent tables and each of them at least have 5 child tables.

    We have not yet receieved any data, (that sucks) as i am supposed to create the logic first. I think I will be only getting the real data next week sometime. I know its weird. But these people are pushing me!!!!!!

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • Hi,

    I am doing same and new to SSIS. Can any one provide sample package how to loop all flat files and Tables.

    Thanks & Regards,

    Kishore

  • SQL King (2/7/2008)


    Now, some of the data may OR may NOT change on day 2, how do I compare Day1's data with Day 2 data?

    Like Matt suggested, it's not possible to know... you haven't posted the Create Table statement, nor the PK, nor any of the data, nor the expected defined record format. You need to post those in order for us to help 'cause, right now, we're just guessing what your data and tables look like. Without those things, all we can say is things like "try to identify something unique in the rows and use standard insert/update technology to migrate the daily changes."...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply