SQL 2005 Database - Design a Mass Load Process

  • Hi All,

    The process requires reading files that is being dumped into a certain directory called the "Pending Directory", the "unknown module" must be able to read from the pending directory from time to time if there are other files being dumped.

    The file that is being dumped into the pending directory is called a "Batch File" which is a text file with a comma separated data inside it, format inside a batch file is similar to this:

    FORMAT:

    BATCH_ID,TABLE_ID,ROWID,MODE,COLUMN_ENTRY,COLUMN_ENTRY,COLUMN_ENTRY,.....

    ----------------------------------------------------------------------------------

    BATCH_ID = There is one(1) batch_id per batch file, and there are many tables per batch file

    TABLE_ID = The ID of the table on where to add, modify, or delete the data.

    ROWID = Typically the row identification per data

    MODE = This is the action that will take place for a certain data, its either Add(A),Modify(M), or Delete(D).

    COLUMN_ENTRY = certain data for a specific column in a table.

    e.g.

    1,22,1,A,Column 1 Entry,Column 2 Entry,....

    1,22,2,D,Column 1 Entry,Column 2 Entry,....

    1,22,3,M,Column 1 Entry,Column 2 Entry,....

    1,22,4,M,Column 1 Entry,Column 2 Entry,....

    1,22,5,A,Column 1 Entry,Column 2 Entry,....

    1,25,1,A,Column 1 Entry,Column 2 Entry,....

    1,25,2,A,Column 1 Entry,Column 2 Entry,....

    1,25,3,M,Column 1 Entry,Column 2 Entry,....

    1,25,4,M,Column 1 Entry,Column 2 Entry,....

    1,25,5,M,Column 1 Entry,Column 2 Entry,....

    I need to design a process with the following requirements:

    1.) A module or something that will timely read Batch files from the Pending directory

    2.) Convert the data into an XML format and validate it using XML schema

    3.) Log if there are erroneous data based from the validation above.

    3.) Insert the data into a Maintenance Database

    4.) After inserting the data, check the constraint, log if error or success.

    5.) All success data shall be inserted to Final Database.

    Right now, I am looking into the following services and technology (SSIS,DTS, and Bulk Insert) to see if it fits with the following requirements, I just don't know if I am on the right direction. If someone can tell me a concrete strategy or design on how to deal with this kind of requirements, then that would be great.

    FYI. I am using SQL Server 2005 Express Edition.

    Thank so much for your time.

  • Well, the first question is are you willing to get SSIS? You mentioned using SQL Server Express, well SSIS is available starting at the Standard version (Basic Transforms). See http://www.microsoft.com/sqlserver/2005/en/us/compare-features.aspx for a comparison of 2005 features.

    The second question is how much data are we talking about?

    How big are the files in MB?

    How big is the destination database?

    What's estimated frequency of receiving the files?

    At the very least, know that you can create a listener on the directory so you can automatically process files as they arrive.

  • This is more of a business process that is data intensive rather than a data process. Are you sure you've go the right tool?

    Having said that, it's not a far stretch to build something in SQL Server...

    You could use BCP to load the data and then use FOR XML to spit it back out. While the data is in the database you could perform any validation routines and flag/move non-conforming records to another table. You could use a SQL Job to poll the directory for the file. You would need to enable xp_cmdshell to interact with the file system and make sure the service account has access to the source/destination folders.

    --Paul Hunter

  • Um, very important question: does every line in these text files have the same number of fields(commas)?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Bradley Deem,

    Well, as I go fighting for the license of this sql edition needed for SSIS to our management, I think that is not possible. We're just using express edition.

    How big are the files in MB?

    - It can grow depending on the data being sent from the terminal.

    How big is the destination database?

    - As big as it can get depending on the server specification

    What's estimated frequency of receiving the files?

    - Depending on the transaction from the terminal.

    Paul Hunter

    Do you have any source on where I can read on or test on what you're suggesting?

    RBarryYoung

    Um, very important question: does every line in these text files have the same number of fields(commas)?

    The answer is no, it has different number of fields and commas since it will inserted to a different tables with their respective table codes.

    If someone can help me layout a detailed implementation plan for this one then that would great.

    Thank you so much for the help.

  • dwither_07 (1/13/2009)


    RBarryYoung

    Um, very important question: does every line in these text files have the same number of fields(commas)?

    The answer is no, it has different number of fields and commas since it will inserted to a different tables with their respective table codes.

    Then I do not believe that any of the foregoing approaches will work. AFAIK, the only practical way to do this involves using client code (vb or c#) to sort out the record types first, and then using some means to load/insert the separate types, separately.

    Anyone who knows differently please say so.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • dwither_07 (1/13/2009)


    Bradley Deem,

    ...

    Paul Hunter

    Do you have any source on where I can read on or test on what you're suggesting?

    BCP is a standard import/export utility that comes with SQL Server however' I'm not sure if it's part or Express or if it can be download as a separate utility. BOL/MSDN has good documentation on the utility {check under BCP Utility}. But the basic layout is:

    BCP [server.]db_name.owner.object [in/out/queryout] src_or_tgt_file switches... and it's executed can be executed using xp_cmdshell.

    Hope that helps.

    --Paul Hunter

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply