Robust data load process

  • I am seeking further information for strategies/best parctices on building a robust control process to co-ordinate multiple data load processes.

    To explain that rather long intro sentence...

    Layer A.

    This layer holds a number of readonly databases that are populated by periodic log shipping or nightly extracts.

    Layer B.

    As the data warehouse is in its infancy and not all dimension requirements are known, we have an intermediary database that stores history from the various source dbs within layer A. This is populated via on overnight SSIS package.

    Layer C

    Load to data warehouse. Again SSIS.

    What we want to design is a means to ensure that when any of the layers fail to populate the other layers processes recognise this and do not perfrom.

    For example, if the history db fails to load for any reason it is imperative that the db's within layer A are not updated as history data will be lost.

    I can see that there are multiple ways to acheive this (control table, enable/disable agent jobs etc) but I wonder if there is an industry standard approach to this.

    Hope this makes sense and any assistance will be welcome.

  • I can't say I know of any industry standard on that. I can just tell you what seems to work for me.

    I'm not a big fan of jobs being able to enable or disable other jobs, so I instead structure my processes to use a control table scenario (like a jobRunStatus table, or a JobBatch table, tracking what happened).

    If I understand your "levels" - I'd use level A as "everything not yet processed", B as "raw version of everything processed" and C "processed version of everything processed".

    I'd usually tag all detail records on my A level with whatever JobBatch Processes them. I would then run the process that takes them to B and through on to C, stopping frequently along the way to make sure nothing failed. If something fails - roll them back, removing the effects from C and B, usually accompanied by a big old "something broke" message to me or my team. If nothing fails - usually I can then clean out whatever is in level A for that Batch, mark the batch complete, and then start the process over if there is another uncommitted batch.

    Now - assuming these things don't take a HUGE amount of time - you might just simply wrap all of that into a single SQL transaction that might just get rolled back on failure of any step, or you design multiple, smaller transactions that allow you to make sure everything is right up until now. One transaction ensures that the rollback process is automated. Otherwise - ensure you have some way to rollback your own version of the process.

    Now - you'd have to determine whether future runs of your process require the older ones to process through FIRST, but either way - you should be able to control that through preliminary steps in SSIS.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for your post Matt. Usefull info that I will take on board.

    Ta

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply