incremental etl with non-mssql

  • the shop here has oracle and other SQL product that the SQL08 is pulling data as a warehouse.

    i can't use replication because of various network issues. the only sure way, I can do is openquery.

    at this point, I am pulling the entire table about 5 GB each every day on a T1 line (slow). it is because any row could have been modified.

    is there a step-by-step guilde on doing incremental loading from oracle to pull only the rows that got changed?

    i can't use UNION ALL to find difference because the source side doesn't have much space to keep 2 copies of table.

    thanks.

  • light_wt (5/12/2011)


    is there a step-by-step guilde on doing incremental loading from oracle to pull only the rows that got changed?

    Best approach is to solve this on the source database.

    Either a timestamp column is updated each time a row gets updated/inserted or a trigger writes a change_log table - in this case all you need is the PK. If your base table also accepts delete statements change_log table may be your best solution.

    Once source system has the capability to tell you which rows have changed you can move only those rows to your staging table on DWH side and process them accordingly.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply