Using a "RunID" to push data to prod DW but not enable for reporting until later?

  • I was wondering how people deal with pushing data to production but not yet enabling it for "public consumption" until later?

    Say I wanted to load data on a Monday, but I had business users who wanted to review it and they would sign off by EOD Monday so Tuesday morning it is "enabled" for the rest of the organization to work with.

    I could push the data to UAT DW or something, but then I would have to maintain a separate environment.

    Does anyone else have a scenario like this and if so what are the options to handle it?

    What if a load goes horribly wrong to prod and I wanted to "fall back" back to the last good load?

  • If you don’t want to manage another environment, you should setup a complex presentation layer (in DB only). This layer is basically set of views that pull selective data based on user / role. It requires DB redesign as well which most of the business don’t accept. It has many advantages in long run but its complex to manage / setup.

    Setting up another environment is easy comparatively.

  • You should have a "staging" environment. This is exactly what this is for. You have Dev, QA, Staging, Prod. I would advise strongly against making your production environment for anything other than production. That would just introduce a lot of challenges and complexity that isn't necessary.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • + 1 for Staging.

  • How crazy is using a snapshot? Data diff will only be a few gigs per load.

    Start snapshot, run load, then let users poke aroind for an hour amd lastly commit? (Sql 08 r1 not r2)

  • Are you talking about opening a transaction and keeping it open on your production server?

    Do you have a different code base for these users so they can view this data?

    What are you really trying to accomplish here?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Maxer (5/1/2012)


    How crazy is using a snapshot? Data diff will only be a few gigs per load.

    Start snapshot, run load, then let users poke aroind for an hour amd lastly commit? (Sql 08 r1 not r2)

    Typing on my phone, I have the above wrong:

    1. Start snapshot

    2. Insert/Update data on warehouse with new day's data (around 1GB to 3GB)

    3. Users start checking new load/reports

    4a. Users sign off, everything went well. Drop snapshot

    4b. Users find error, revert database to before load using snapshot (database goes offline while this happens but... Faster than restoring a copy of DB...and this "worst case scenario" downtime is acceptable )

  • Sean Lange (5/1/2012)


    Are you talking about opening a transaction and keeping it open on your production server?

    Do you have a different code base for these users so they can view this data?

    What are you really trying to accomplish here?

    Goal is to be able to fallback if the load goes wrong.

  • Personally, I like the Kimball suggestion of having a surrogate key on your fact table. If you log the surrogate keys included in each ETL batch this gives you a simple method of backing out a particular ETL run. I also like the Audit Dimension pattern http://www.kimballgroup.com/html/designtipsPDF/DesignTips2001/KimballDT26AddingAnAudit.pdf This gives you a simple, effective method of highlighting the quality of your data to the users.

  • Thanks for the link!

    I understand the backing out of the run for insert only, but if we also updated records... We couldn't back those out (without an audit log of some sort giving the old values, etc...)

  • Another interesting link: http://www.kimballgroup.com/html/designtipsPDF/DesignTips2005/DTKU74Compliance-EnabledDataWarehouses.pdf

    without an audit log of some sort giving the old values, etc...

    Yep, and there are, of course, a number of different ways to approach it.

    e.g. Change updates into insert/logical deletes. Keep a log / backup / snapshot of beforehand. Fully type 2 fact table.

    You might look into partition swapping as an option.

    We are starting to talk about substantial design changes though. It will all depend on your environment and constraints.

    A seperate UAT environment is one possibility to vet changes before loading production. Or you could have the user peruse the staging data before completing the ETL as others have mentioned (but this may not make sense where ETL transformations are too complex).

    Some people go as far as to maintain a second production version of their DWH that contains all fully audited changes to everything.

    Clear as mud? 😉

  • I have an idea for a different approach. It would take a little effort to set up, but way easier than a parallel environment.

    -Set up some kind of user acceptance table in your database.

    -Set up a report for the authorizing user(s) to look at with a report action link on it. Access to this report must be restricted to just the user/users that have approval authority.

    -The report action link jumps to an extremely simple report with another link on it that says something like "Are you sure you want this data in production" which is another report action link. Access to this report must be restricted to just the user/users that have approval authority

    -The next report actually has an update statement as part of the query that updates the user acceptance table to approved or yes or true or whatever works for you. The report can simply say something like "Thank you, the data has been moved to production." Access to this report must be restricted to just the user/users that have approval authority

    -The public consumption report starts with a query to the acceptance table to check which data set criteria is the appropriate dataset to return to the report. My initial thought is that the main report query would have some kind of internal parameter that could be used to determine which slice of data to use by date or some other easily accessible data point.

    If for some reason the authorizing user made a 2 step mistake undoing it would start with changing the user acceptance table to false and then fixing your data and either resetting the user acceptance table to true or asking the authorizing user to re-approve the data.

  • Daniel Bowlin (5/3/2012)


    -The next report actually has an update statement as part of the query that updates the user acceptance table to approved or yes or true or whatever works for you. The report can simply say something like "Thank you, the data has been moved to production." Access to this report must be restricted to just the user/users that have approval authority

    Or, have this step of the process fire off an ETL task (SSIS, BusObj, whatever) that moves the staging data into production. If the okey-dokey folk never get around to approving the data, the production data isn't changed, and the possibility of requiring a backout is much lower. Then, the staging environment is purged and reloaded with "today's" data.

  • Daniel Bowlin (5/3/2012)


    I have an idea for a different approach. It would take a little effort to set up, but way easier than a parallel environment.

    This is an interesting approval approach, but I'm not sure how it solves the original problem, that of having production ready data available/visible to a user for approval and yet unavailable/invisible in the actual production environment.

    I don't mean to be overly critical, I'm just interested in how you intend it to work.

    Are you providing the data-in-need-of-approval to the user via a report?

    It certainly sounds viable, but the OP will have to provide input to determine if it would work in their environment.

  • I basically gave up on the RunID approach. Spoke with businesses users, just signing off in a test environment is good enough. Will use snapshots to allow for a rollback on prod incase anything really weird happens.

    Looked at RunID, but it became too complex to maintain two versions of "the truth" in prod. Could have done separate databases and just switched connections but not worth the hassle.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply