Data Warehouse Refresh Problem

  • Hi All,

    We're having an issue reloading the data to the data warehouse DB.

    During the time that the data is being loaded into large tables, users are unable to query the data. (obviously)

    What they want is for before a load occurs, the data to be "frozen" while the load is happening, which will enable them to still do queries during the load process.

    After the full load finishes, the up to date data is available instead of the frozen data.

    We have attempted to achieve this by mirroing the data warehouse DB to another server, and taking a snapshot of the DW on the other server. Users are given access to this snapshot to query. The snapshot runs after the full load on the data warehouse, meaning that the snapshot is static for each day until after the full load happens and the snapshot is refreshed.

    Unfortunatley, when taking the new snapshot, if the current snapshot is still on, it takes a very long time to sync (6 hrs). If we turn off the snapshot and recreate it from scratch, it completes faster (30 mins) but the issue is that we will still have those 30 mins where the frozen data is not available to query.

    Are there any options we have to make the frozen data constantly available, and refreshed daily?

  • Could you use two copies of each table and use synonyms to switch to and fro' every day?

    The users query the data using the synonyms & have no need to know what is going on in the background tables.

    The process would be

    a) Users are querying table set A via synonyms

    b) ETL process loads table set B

    c) ETL process switches synonyms to point to table set B

    This does, of course, rely on your ETL process being able to accommodate the scenario.

    • This reply was modified 5 years, 6 months ago by  Phil Parkin.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Great Idea!

    Thanks Phil.

    This would work if they were loading a full set of data each day, but unfortunately its an incremental load, meaning 24 hr of data are inserted into the table each load.

    Using the above method would mean that the tables are missing alternate days of data.

    I'm wondering if there's some modification of the above that might work for our scenario?

    Cheers,

    Lloyd.

     

  • Rin Sitah wrote:

    Great Idea! Thanks Phil. This would work if they were loading a full set of data each day, but unfortunately its an incremental load, meaning 24 hr of data are inserted into the table each load. Using the above method would mean that the tables are missing alternate days of data. I'm wondering if there's some modification of the above that might work for our scenario? Cheers, Lloyd.  

    Understood, though it's not clear why the process doesn't just insert "all new rows" regardless of how much time has elapsed since the previous load.

    Is "24 hr of data" being presented in some sort of extract ... a file, perhaps? Meaning you cannot query the source directly?

    Even if that is the case, all that would be required is that two files get loaded, rather than one, as part of a load. As the tables being loaded are not being used during the load, an increase in the overall load time is perhaps tolerable?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Or maybe this?

    a) Users are querying table set A via synonyms

    b) ETL process loads table set B with data from file1

    c) ETL process switches synonyms to point to table set B

    d) When all users are off the system (eg, 11.00pm), an ETL process loads data from file1 to table set A (or from table set B to table set A)

    At this point, both sets of tables are in sync, ready for the next day's action.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Rin Sitah wrote:

    Great Idea! Thanks Phil. This would work if they were loading a full set of data each day, but unfortunately its an incremental load, meaning 24 hr of data are inserted into the table each load. Using the above method would mean that the tables are missing alternate days of data. I'm wondering if there's some modification of the above that might work for our scenario? Cheers, Lloyd.  

    You could use the approach outlined by Phil - if you changed the extract processes to use a sliding window.  Instead of pulling the last 24 hours, pull the last 72 hours.  I am assuming that you are performing some type of MERGE now - if so, then each load process will 'catch' up because the new file has overlapping data.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply