Staging area best practice

  • Hi all,

    I have a quick question about data flow process best practice as follows:

    I am pulling data from various data source using SQL code (SP & SSIS PKG) to populate a DW

    (Dimension and Fact tables).

    The way it is done is as follows:

    I created a schema called "source" to hold incoming data from various data sources as is, another schema called "Stage" to hold the massaged data (after applying business rules) then a third schema called "DW" to hold dimension and fact tables in a final format for reporting.

    So having 3 schemas in one database? is this the right thing to do or should I have a 3 separate database completely?

    Thanks for your advise.

  • I would, at least, put the DW schema on its own database. That way you can backup just that data. I'm not sure how long do you keep the data on the other schemas, but I'm certain that you don't need to retain all history.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • oneteabag (9/22/2016)


    Hi all,

    I have a quick question about data flow process best practice as follows:

    I am pulling data from various data source using SQL code (SP & SSIS PKG) to populate a DW

    (Dimension and Fact tables).

    The way it is done is as follows:

    I created a schema called "source" to hold incoming data from various data sources as is, another schema called "Stage" to hold the massaged data (after applying business rules) then a third schema called "DW" to hold dimension and fact tables in a final format for reporting.

    So having 3 schemas in one database? is this the right thing to do or should I have a 3 separate database completely?

    Thanks for your advise.

    If it were me, I'd create three databases. Then, if you should need to scale-up the solution at some stage in the future, you have an easy path (putting the DBs on separate instances).

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • So if I elect to create 3 different databases. The first DB which is holding source data, should I create different schemas inside this database referencing the different data sources?

  • oneteabag (9/22/2016)


    So if I elect to create 3 different databases. The first DB which is holding source data, should I create different schemas inside this database referencing the different data sources?

    I'm not sure that there is a best practice for this, but I would do that, yes.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • So having 3 schemas in one database? is this the right thing to do or should I have a 3 separate database completely?

    It depends.

    I have seen this done both ways - landing, staging and DW tables all in one DB in different schemas and in different databases. There are pros/cons to each with respect to how easy it is to back stuff up, assign permissions, lock stuff down, performance, etc. I personally like having all this stuff in one DB with different schemas - maintaining 1 DB is easier than three: less files to maintain, less permissions to assign, easier to backup, etc. That's my personal preference.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I am more into having 3 different databases, I think it is a lot cleaner and easier to organize and maintain. I have 300+ table in one DB right now.

  • Having 3 different databases has other advantages than those already mentioned. The BIG ones are that your "IMPORT" and "STAGING" databases likely don't need to be backed up because all of the data there can easily be recreated from import files and the data is, at best, transient in nature to begin with. That also means that both databases can be set to the SIMPLE recovery model and that means that you can also take advantage of high performance/low resource usage Minimal Logging methods especially when Trace Flag 610 is enabled.

    Please see the following white paper for more on Minimal Logging and the use of Trace Flag 610. The rub here is that I have no clue if any of that works in SSIS because I normally do all of this type of work in T-SQL.

    https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply