January 20, 2015 at 11:03 am
Hi,
I'm using SSIS 2008 r2 - we have a innodb MySQL instance running in an AWS cloud. I'm running an ssis package that pulls some data from there and loads it into a local SQL server. No transformations just a straight copy in a data flow task.
To access the data held in MySQL I am using an system DSN with an MySQL ODBC 5.1 Driver, then using an ADO.net source component.
Some of the tables I am pulling across are reasonably big (>50,000,000) records, so obviously I'd like to bring the data across in batches. I've configured the destination to insert in batches and set the default buffer size etc. However when I run the package it hangs for a long time and I *think* whats happening is that all the data from source is being buffered at the client before the data flow task is starting to perform the insert in batches, rather than retrieving the data in batches from source.
Is this normal? The problem is for a 50,000,000 row table DTExec consumes nearly 15GB of memory to do the insert. Has anyone else encountered this and found any way to make the data flow a little bit more scalable?
Thanks,
Bob
January 20, 2015 at 11:09 am
What's the buffer size that you're using? For a source, if the source is slow and you set the buffer to be large, it can take a really long time to fill the buffer. So sometimes you have to set it to be lower. But that's probably the issue.
Edit: If the buffer is higher, like 5k, 10k, 20k, etc. I would change that to be much lower, maybe even 1k.
January 20, 2015 at 11:13 am
A few things I can think of or recommend:
1. What is your MaxInsertCommitSize set to?
2. If you are doing a select * from the source, then the source table has to be scanned entirely and it will take a long time. I'd recommend that you see if there is some field (ID, Date) you can use to create multiple data flows with more specific queries.
3. Also use a query against your source table, as opposed to connecting to the table directly. There is a bit of overhead when the designer needs to check the metadata.
4. How are your destination fields structured? I'd recommend staging your data and using varchar fields in the staging table. Implicit conversions will take some time, and you could run into issues with incompatible data types.
January 21, 2015 at 2:21 am
Hi, thanks for the replies.
The DefaultBufferMaxRows is set to 10000 (10 thou) , DefaultBufferSize is set to 2000000 (2M), the rows per batch on the destination I have set to 5000 (5 thou) and the Maximum Insert Commit Size was set to 5000 (which I've just noticed and was a mistake I should of left that empty, but I don't think that will affect it)
On the advice about implicit conversions I have taken care on this phase of the ETL to stage the data exactly 1 for 1 so there is no conversion, but I don't think the issue is at the insert end I'm fairly confident it's at the source...
The reason I think this is when you run the package (which I have to do through DTEXEC so it runs as 64bit) is that when it's hanging the memory being consumed by DTEXEC rises to over 15GB but no rows are being inserted into SQL which I am checking by doing row counts on the destination table with no lock.
I am curious about the point made by the first gentleman about setting the buffer to 1k are you referring to the DefaultBufferSize property in the data flow and are simply calculating a size in bytes bases on an average row length or is there a property hidden somewhere that I don't know about? :O)
Thanks,
Bob
January 21, 2015 at 7:05 am
Hi Bob
Hows it going?
I had a quick Google for you. There seems to be a connection setting which forces the ODBC driver to use a forward only cursor, which looks like it might fix your problem
http://forums.mysql.com/read.php?37,206764,241250#msg-241250
http://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-connection-parameters.html
Have a look at "prefetch", "FORWARD_CURSOR", "NO_CACHE" and table 5.3
January 21, 2015 at 7:28 am
Hey Sam!
Yea it's going great mate. How are you?
Thanks for that - I did do some googling myself but couldn't find anything, I don't often post in forums its only when I get really stuck ;OD.
I've just ran a test with the "don't cache results of forward only cursors" option checked on the ODBC driver and that's done the trick. Good to know my suspicions were correct in that it was buffering the data at the client.
Cheers,
Bob
January 21, 2015 at 8:03 am
Glad it worked!
TBH all I searched for was "MySQL ODBC 5.1 Driver buffer" (minus the quotes). This thread was number 3 on the results; a sure sign of it not being that common a problem 😛
Not going too bad - still contracting of course 🙂
January 21, 2015 at 8:06 am
ahh right, I was thinking my next step would be to look for the documentation for that ODBC driver. Obviously you contractors are better at finding stuff on google than the rest of us!! perhaps that's the difference eh!! :O)
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply