ETL from MySql

  • Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

  • thbaig - Tuesday, November 28, 2017 10:48 AM

    Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

    Simply put do not use a linked server to do this it's not the temp table that will slow down the process, extract the data on the mysql side to a flat file, transfer the file to someplace SQL Server has access to and load the file from there.

  • ZZartin - Tuesday, November 28, 2017 10:55 AM

    thbaig - Tuesday, November 28, 2017 10:48 AM

    Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

    Simply put do not use a linked server to do this it's not the temp table that will slow down the process, extract the data on the mysql side to a flat file, transfer the file to someplace SQL Server has access to and load the file from there.

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

  • thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • I would use SSIS to load the data into your staging table. SSIS will handle chunking out the data so you don't have to move 1 million row in one transaction. If you want to use a linked server chunk out the data yourself in a loop.

  • Phil Parkin - Tuesday, November 28, 2017 11:39 AM

    thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?

    dependency. i will need to manage a routine at mysql to extract data then copy to network, copy to mssql and then load. also as this will be recursive job, i will need to  do whole process for e very execution that don;t seems optimal

  • thbaig - Tuesday, November 28, 2017 12:38 PM

    Phil Parkin - Tuesday, November 28, 2017 11:39 AM

    thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?

    dependency. i will need to manage a routine at mysql to extract data then copy to network, copy to mssql and then load. also as this will be recursive job, i will need to  do whole process for e very execution that don;t seems optimal

    SSIS can do everything listed, but it's over-complicating things. Joe's idea is better. Don't use a linked server.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply