SSIS Grinds my PC to a crawl when importing 70k rows

  • Hi All,

    This may be a bit of a stupid question but, I have a package I'm building to pull a bunch of data in from a multitude of files then do some some other bits of work on that data. The first section I am having issues with is an import from two .csv files, totalling about 70k records with 15 odd fields in each file (the layouts are identical).

    Baiscally in my package I have two flat file source objects, two sorts (to allow a merge), a merge, then its all piped into a destination object which is pointing at a table in my DB.

    When I run this package it starts ticking up the number of rows and after not too long my machine is all but un-usable because its so slow. Looking at my system resources it seems to eat all m y page file but the application itself seems to use no memory at all. I am assuming this is because ssis is loading all this data into my local memory or something.

    So my four questions are

    1.) Is my assumption about loading it all into memory correct?

    2.) Why does it do this?

    3.) can I stop it doing this?

    3a.) If not can I make it use my servers' memory instead or something?

    4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?

    Thanks!

    Dave

  • david.morrison-1035652 (11/4/2009)


    So my four questions are

    1.) Is my assumption about loading it all into memory correct?

    2.) Why does it do this?

    3.) can I stop it doing this?

    3a.) If not can I make it use my servers' memory instead or something?

    4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?

    1. Yes, it will use memory until it gets an internal memory pressure warning then it will start using temp files.

    2. Insufficient memory.

    3. Not without more memory. But come to think of it you might be able to tell it to deal with smaller batches of data. In the properties for the Data-Flow itself, you might change DefaultBufferMaxRows from 10000 to 5000 or even 1000..

    3a. If you run the package ON the server then you use its memory..

    4. Quicker way, probably, but not likely without $$$, SSIS is a great way to do this.

    CEWII

  • 1.) Is my assumption about loading it all into memory correct?

    2.) Why does it do this?

    3.) can I stop it doing this?

    3a.) If not can I make it use my servers' memory instead or something?

    4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?

    SSIS does all its work in memory, that's what makes it so fast, but sorts are memory hogs. I agree change the commit size / number of rows, this will clear rows from memory once they are written to disk.

    No you can't stop this behavior.

    Unless I'm missing something you can easily automate that.

    setup an ssis job that uses a for each container setup using the for each file enumerator.

    this can be configured to loop through all the files in a directory.

    First rename one of your files to something like InProcess.txt then setup a file connection to this file. Setup your columns and map it to your destination (in a data flow task).

    Once that is working setup your for each loop to rename the files to InProcess.txt or whatever, then exectue the data flow, then rename the file back or delete or rename and move it into a processed files folder.

    set up the job on the server, make sure the service or proxy account has access to the files.

    Create a job to execute the package on a schedule.

    hth,

    tom

  • Heh... SSIS fast? Guess I'll never know. 😉

    This is a simple task in T-SQL using BULK INSERT and even if you forget to limit the batch size (they actually recommend against it), it'll automatically spool to temp db. I don't know how long your current process is taking but I've imported 5.1 million 20 column wide rows in 60 seconds flat with BULK INSERT... and that's on a single cpu desktop machine that's almost 8 years old.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If your table has indexes on it modify the package to drop the indexes, then import the data, then create the indexes.

  • if you use standard edition this will lock the table until the index build is finished, will probably take longer.

  • Hi All, just wanted to close this off by saying I solved this be seperating the file imports into several different data flow tasks within the same package. This gets round ssis trying to load all the huge files into memory at once. This meant I also removed the merge and sort operations which where the main cuplrits for the grinding to a halt!

    Cheers

    Dave

  • David,

    Thanks for letting us know your resolution.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply