SSIS - excel file manipulation

  • Hello everybody,

    I have a task where I need to process roughly 60000 excel spreadsheets and bring them into a SQL Server 2014 database. All excel files have the same format and same number of identical columns. I have set up an SSIS package that is using Foreach Loop Container to look into a folder and process these files one at a time and load them to a table. The mappings are straight-forward, no problems there.

    I am attaching a sample spreadsheet with two tabs - current structure and desired structure.

    Basically what I need to do is to repeat the first 7 columns based on the number of lines in a transaction.

    The number of lines is variable per patient.

    Thanks for your input,

    Petr

  • The only way I can think to do this is by using a script transform in the data flow.

    The logic in the dataflow needs to capture the values from those 7 columns each time the change and for each row of data, output the last values captured.

    It's reasonably simple logic, but I can't provide more specific details without having access to SSIS.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • vecerda (7/10/2015)


    Hello everybody,

    I have a task where I need to process roughly 60000 excel spreadsheets and bring them into a SQL Server 2014 database. All excel files have the same format and same number of identical columns. I have set up an SSIS package that is using Foreach Loop Container to look into a folder and process these files one at a time and load them to a table. The mappings are straight-forward, no problems there.

    I am attaching a sample spreadsheet with two tabs - current structure and desired structure.

    Basically what I need to do is to repeat the first 7 columns based on the number of lines in a transaction.

    The number of lines is variable per patient.

    Thanks for your input,

    Petr

    If this data is brought into a staging table, it could probably be updated using a query that uses the LAG function to pull data from the most recent record with the same patient identifier, and if the Line Number column is guaranteed to not skip any values, this could be quite simple. LAG() allows a parameter to specify how many records to go back for the specified field. That number just needs to be the current Line Number value minus 1, unless the current value is 1. You'll need a CASE statement for that last part.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I have actually ended up doing it exactly that way. I kept trying to use Alvin's method and could not get it to work so I brought them into a staging table and used LAG function.

    Took me 8 hours of struggling with Alvin's method to abandon it for a LAG solution,

    Petr

  • vecerda (7/13/2015)


    I have actually ended up doing it exactly that way. I kept trying to use Alvin's method and could not get it to work so I brought them into a staging table and used LAG function.

    Took me 8 hours of struggling with Alvin's method to abandon it for a LAG solution,

    Petr

    The staging table method is definitely easier if you're not used to Script Transforms in dataflows.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Hello,

    You just unite those files into single file using GSplit tool then configure that single file to SSIS...

    Thanks & Regards,

    shyamsunder

  • sirishyam (7/20/2015)


    Hello,

    You just unite those files into single file using GSplit tool then configure that single file to SSIS...

    Thanks & Regards,

    shyamsunder

    First, will GSplit correctly combine Excel files?

    Using GSplit, if it worked would be adding unnecessary 3rd party software.

    Adding logic to combine the files is adding unnecessary complexity.

    You would lose the information about which file the data came from.

    Last but not least, it's jut not the right way to do things.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply