November 4, 2009 at 6:42 am
Hi All,
This may be a bit of a stupid question but, I have a package I'm building to pull a bunch of data in from a multitude of files then do some some other bits of work on that data. The first section I am having issues with is an import from two .csv files, totalling about 70k records with 15 odd fields in each file (the layouts are identical).
Baiscally in my package I have two flat file source objects, two sorts (to allow a merge), a merge, then its all piped into a destination object which is pointing at a table in my DB.
When I run this package it starts ticking up the number of rows and after not too long my machine is all but un-usable because its so slow. Looking at my system resources it seems to eat all m y page file but the application itself seems to use no memory at all. I am assuming this is because ssis is loading all this data into my local memory or something.
So my four questions are
1.) Is my assumption about loading it all into memory correct?
2.) Why does it do this?
3.) can I stop it doing this?
3a.) If not can I make it use my servers' memory instead or something?
4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?
Thanks!
Dave
November 4, 2009 at 8:03 am
david.morrison-1035652 (11/4/2009)
So my four questions are1.) Is my assumption about loading it all into memory correct?
2.) Why does it do this?
3.) can I stop it doing this?
3a.) If not can I make it use my servers' memory instead or something?
4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?
1. Yes, it will use memory until it gets an internal memory pressure warning then it will start using temp files.
2. Insufficient memory.
3. Not without more memory. But come to think of it you might be able to tell it to deal with smaller batches of data. In the properties for the Data-Flow itself, you might change DefaultBufferMaxRows from 10000 to 5000 or even 1000..
3a. If you run the package ON the server then you use its memory..
4. Quicker way, probably, but not likely without $$$, SSIS is a great way to do this.
CEWII
November 5, 2009 at 11:05 am
1.) Is my assumption about loading it all into memory correct?
2.) Why does it do this?
3.) can I stop it doing this?
3a.) If not can I make it use my servers' memory instead or something?
4.) Is there a quicker way of loading a bunch of files, all identical in format into a single table, in a way I can automate?
SSIS does all its work in memory, that's what makes it so fast, but sorts are memory hogs. I agree change the commit size / number of rows, this will clear rows from memory once they are written to disk.
No you can't stop this behavior.
Unless I'm missing something you can easily automate that.
setup an ssis job that uses a for each container setup using the for each file enumerator.
this can be configured to loop through all the files in a directory.
First rename one of your files to something like InProcess.txt then setup a file connection to this file. Setup your columns and map it to your destination (in a data flow task).
Once that is working setup your for each loop to rename the files to InProcess.txt or whatever, then exectue the data flow, then rename the file back or delete or rename and move it into a processed files folder.
set up the job on the server, make sure the service or proxy account has access to the files.
Create a job to execute the package on a schedule.
hth,
tom
November 5, 2009 at 9:26 pm
Heh... SSIS fast? Guess I'll never know. 😉
This is a simple task in T-SQL using BULK INSERT and even if you forget to limit the batch size (they actually recommend against it), it'll automatically spool to temp db. I don't know how long your current process is taking but I've imported 5.1 million 20 column wide rows in 60 seconds flat with BULK INSERT... and that's on a single cpu desktop machine that's almost 8 years old.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 6, 2009 at 9:51 am
If your table has indexes on it modify the package to drop the indexes, then import the data, then create the indexes.
November 16, 2009 at 3:22 pm
if you use standard edition this will lock the table until the index build is finished, will probably take longer.
December 14, 2009 at 4:35 am
Hi All, just wanted to close this off by saying I solved this be seperating the file imports into several different data flow tasks within the same package. This gets round ssis trying to load all the huge files into memory at once. This meant I also removed the merge and sort operations which where the main cuplrits for the grinding to a halt!
Cheers
Dave
December 14, 2009 at 5:56 am
David,
Thanks for letting us know your resolution.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply