data load performance

  • Hello,

    using a combination of ssis and C#, we are placing file data into sql server tables.

    The files contain around 70 million of rows...

    We can not use bulk insert as there are checks being done on every column in every row.

    What steps can we take to improve the speed of the inserts?

    Thank you

  • arkiboys (8/17/2016)


    Hello,

    using a combination of ssis and C#, we are placing file data into sql server tables.

    The files contain around 70 million of rows...

    We can not use bulk insert as there are checks being done on every column in every row.

    What steps can we take to improve the speed of the inserts?

    Thank you

    A couple of things come to mind....

    break the file up into multiple files and run them in parallel.

    Parallel Proccessing Link[/url]

    Utilize this SSIS component for faster data loads...

    Balanced Data Distributor

    2014 Version of the Balanced Data Distributor

    2014 Version BDD

  • If you tell us more about the structure of your SSIS package: what components you are using etc. (even a few screen shots would probably do it), we might be able to provide some more-specific advice.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • SSIS does a good job of manipulating rows and columns within buffers as it moves through the pipeline, but sometimes loading the data into target table as fast as possible and with with minimal logging is the most important goal. David Poole recently wrote an article here about leveraging bash commands and regular expressions for performing data validation and manipulation on text files. http://www.sqlservercentral.com/articles/ETL/136941/

    Consider pre-processing your file using Execute Process tasks, and then bulk load.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (8/17/2016)


    SSIS does a good job of manipulating rows and columns within buffers as it moves through the pipeline, but sometimes loading the data into target table as fast as possible and with with minimal logging is the most important goal. David Poole recently wrote an article here about leveraging bash commands and regular expressions for performing data validation and manipulation on text files. http://www.sqlservercentral.com/articles/ETL/136941/

    Consider pre-processing your file using Execute Process tasks, and then bulk load.

    As far as I know, 'Windows Bash' is available only on Windows 10 – and not on any Windows Server platform – so it's not ready for non-Linux production ETL code, IMO.

    But there are alternative pre-processing methods, of course.

    What I dislike most about the pre-processing method is when it comes down to data lineage: for example, tracing a value in a DW table back to its source. This source can no longer be said to be in a database; you'd have to go back to an archived version of the original file to be certain. And when that file is 50GB in size, it's not just a case of firing up Notepad to check.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • arkiboys (8/17/2016)


    Hello,

    using a combination of ssis and C#, we are placing file data into sql server tables.

    The files contain around 70 million of rows...

    We can not use bulk insert as there are checks being done on every column in every row.

    why not?

    I have several processes doing row/column validation on date with similar volumns and I do use the bulk load options to load the data.

    processing is always done prior to the insert of the data onto the table so really can't see why you state that you can't use bulk load. unless you are talking about using BCP.

    As for your SSIS and C# - is it SSIS And C# or SSIS with C# Scripts within the package?

    With regards to SSIS the following are things that will delay your process

    - derived columns -- one way to speed them up is to split these and have a max of 10 columns per derived object. even if it means having 20 derived objects for same flow (the 10 and 20 can vary depending on other factors)

    - c# scripts -- same applies

    - lookups -- these can be quite slow - one option many times is to load the data into a temp/intermediary sql server table and do the join in SQL - and yes the overhead of loading the data is in many cases insignificant vs the cost of the lookup

    And tune the dataflow buffers/rows per buffer according to your data. in most cases I've dealt with the defaults were not the best option.

    And get the fastload option set on the destination..deal with the problems through the error processing

  • arkiboys (8/17/2016)


    Hello,

    using a combination of ssis and C#, we are placing file data into sql server tables.

    The files contain around 70 million of rows...

    We can not use bulk insert as there are checks being done on every column in every row.

    What steps can we take to improve the speed of the inserts?

    Thank you

    It's simple. Import the data to a staging table using BULK INSERT. Once there, you can use SQL Server for one of the things it was meant to do. It'll keep you from reinventing the wheel by rewriting parts of T-SQL that already exist. It'll also keep you from having to go through another layer, which may be a performance improvement in itself.

    As for parallel loads, unless you can guarantee that you're writing to truly different spindles, parallel loads will only cause additional head chatter and slow the whole ball of wax down.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply