SSIS using MySQL sources and the query finishing early.

  • Hi all,

    I have SSIS (SQL 2005) packages that use MySQL 5.1.6 as a source. I have a couple of packages that behave like I describe below.

    One of the packages has a source that joins a few tables together and return a recordset of about 3 million records (growing about 500,000 every 6 months).

    The main table in that query has 113 million rows, and the main column it joins on is indexed.

    Occasionally, the process stops before it's got to the end. It doesn't crash, and the full SSIS package runs and returns success, but it just doesn't return all rows (personally, i'd rather it crashed!!).

    The number it stops on varies. A few months ago, it stopped at 488000. Another instance 1260. This weeks is bizarre as there have been 2 instances, 3 days apart and both stopped at 91690 rows.

    Initially, I thought it might be a lack of index maintenance on the MySQL side, however, the statistics were re-run in between the 2 SSIS runs that stopped at 91690 rows.

    I'm now wondering whether it might be some kind of memory related issue on the MS SQL side, a problem with other queries running on the tables on the MySQL side, or maybe inserts happening on the MySQL side during the process of copying the 3 million rows.

    Has anybody seen this behavior before (on any version of MS SQL), and if you've resolved it, is there anything specific I should be looking for.

    Any help would be much appreciated!

    Kind regards,

    Phil

  • Hi Phil

    Please check max server memory settings in Server Configuration Options first,

    it looks that the SQL Server consumes all available RAM, leaving no room for SSIS.

    Buffer Sizing - it needs tunning in the next step.

    Br.

    Mike

  • Hi Mike,

    Thanks for the info, that's great. I think i'll have a bit of a google about that subject and see what I can change. Also, due to some interesting firewall configurations, the SSIS packages run on a different server to the destination SQL database.

    I'll let you know how I get on!

    Kind regards,

    Phil

  • Hi

    Let me clarify something, both source and destnation servers are on the same or different network?

    Br.

    Mike

  • Hi Mike,

    Same building, same server room - possibly not regarded as on the same network (my networking skills aren't exactly sys admin level);

    The source database is a linux server with MySQL on it on one domain i.e. mysql.restricted.local

    Then there is a windows machine with MS SQL on it that runs the SSIS packages, also on the same domain as the source i.e. ssis.restricted.local

    Then there is a firewall.

    There there is the destination database, a windows machine with MS SQL on it that is on a different domain i.e. mssql.internetfacing.local

    A couple of extra bits of info:

    My query that returns 3 million rows includes a function that queries a 3 tables that returns between 1 and 3 rows, but only outputs to id from the last row.

    While the 3 million recordset based on a query on about 6 tables stops short of it's total quite often, there is another table that is imported without joining any other tables that imports fine, and is 60 million rows.

    Any help would be much appreciated.

    Kind regards,

    Phil

  • Hi again

    I only wanted to be sure that problem is not related to the limitations set on the network devices (rate-limit, traffic shaping, etc.).

    We also can exclude the firewall issue from list, so if there's no huge problem, temporary change the destination to the flat file and check if it works.

    Regards

    Mike

  • I found out what the problem was with this. It is to do with the MySQL indexes and how it maintains their cardinality.

    MySQL defaults to taking 8 random pages from the table and checking the values on those pages. Due to the nature of how we generate the data (batches of about 20,000 records with the same value for the indexed column), it periodically finds whole pages of the same value and comes to the conclusion that there are 18 unique values in a 100 million row table. There are actually about 50,000 unique values. Another time it'll find the correct number of values and set the cardinality correctly.

    Apparently the fix is to increase the number of pages it analyses.

    Thanks for your help though Mike, your suggestions were much appreciated.

    Kind regards,

    Phil

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply