Reusing a Data Warehouse?

  • Hi all,

    We have two internal "companies" where I work, both using the same DataFlex-based OLTP but with the data stored in what are essentially two copies of the same database (i.e. one set of the same tables for each company). There are some differences between the structures when it gets moved to the staging area (the OLTP has a few "one true lookup tables" allowing customisation) but they are basically the same with a few minor quirks.

    I know from experience that when treating these two companies differently then I've ended up creating two virtually identical SSIS projects which have then had to be maintained individually. I make my fair share of mistakes, some of which have been to apply changes to one company's SSIS project and not to the other, and this is what's driving me to find a solution to the problem - especially as I have the opportunity to rewrite my first attempt at a DW properly.

    What I'm considering is to have the structure of the data warehouses for the two companies exactly the same (the differences are negligible as far as the bigger picture is concerned) and that the fields exclusive to either company will appear in both DWs. I'd hide the redundant fields from the end-users, and would look to use the same SSIS ETL package for both companies with the connections set when the package fires up to point at the correct source of data and staging area etc. This really appeals to the over-zealous "maintainability-nut" in me but the idea of redundancy makes me feel pretty queasy.

    Would one SSIS project for the common-ground between the two internal companies and separate projects to handle their differences between them be a better approach, with the separate packages executed by the common-ground parent depending on the company?

    These are only ideas, and as I've not committed myself to any approach as yet I'd really appreciate any views or opinions on it you all might have. Even a reply of "You really haven't thought this through properly have you?" will go down pretty well... 😀

    Cheers,

    RF

    _____________________________________________________________

    MAXIM 106:
    "To know things well, we must know the details; and as they are almost infinite, our knowledge is always superficial and imperfect."
    Francois De La Rochefoucauld (1613-1680)

  • RainbowFfolly (11/11/2010)


    I know from experience that when treating these two companies differently then I've ended up creating two virtually identical SSIS projects which have then had to be maintained individually. I make my fair share of mistakes, some of which have been to apply changes to one company's SSIS project and not to the other, and this is what's driving me to find a solution to the problem - especially as I have the opportunity to rewrite my first attempt at a DW properly.

    What I'm considering is to have the structure of the data warehouses for the two companies exactly the same (the differences are negligible as far as the bigger picture is concerned) and that the fields exclusive to either company will appear in both DWs. I'd hide the redundant fields from the end-users, and would look to use the same SSIS ETL package for both companies with the connections set when the package fires up to point at the correct source of data and staging area etc. This really appeals to the over-zealous "maintainability-nut" in me but the idea of redundancy makes me feel pretty queasy.

    This is a pretty interesting approach to solve the issue but the universe will act against such commendable effort. The 2nd Law of Thermodynamics states that over time things tend to disorganize a.k.a. entropy increases over time - therefore sooner or later those two sister structures will get different... and there is nothing you can do to prevent that from happening.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I would try to avoid having 2 sets of logic. I would try to only have one package, or one set of packages. I normally try to use conditional logic inside the package(s) to deal with differences.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks guys!

    I've got to admit that it's normally internal action within the company that affects my design considerations, but knowing that the Universe itself could end up conspiring against me is starting to make me panic and get seriously paranoid! :w00t: 😀

    The two sister structures becoming more and more different that Paul (with another great analogy) mentions is a real concern; although one I can have control over in some respects, in others it's always going to be completely out of my hands. The quality of the data from the OLTP is a major issue, and feeding back cleansed data into it from the data warehouse will help to standardise the sources somewhat (and give me and the DW more credibility in decisions), but as Paul says, they will over time still divert more and more whatever I try to do. We may be moving to a SQL 2005 based OLTP from the same provider at some point in the future, and although this could help in standardising the two companies if implemented, I don't want to make major design decisions on "what ifs" based on "what ifs" (if you understand what I mean).

    Alvin's suggestion is one I was "sort of" thinking about that he's helped make a lot clearer to me and would remove the redundancy in the structures of the data warehouses I was considering. It also avoids the serious risk of a "one-size-fits-all" DW and ETL ultimately fitting neither company. Handling some Data Flow tasks within the package using conditional logic would be fine as the differences in some areas are relatively minor and static and wouldn't over-complicate things. Where I can foresee there is a serious possibility in an area differing between the two companies in the future, I'd probably want to take it out as a separate package or Control Flow task (if possible) and treat it as a "component" that would reduce the testing and maintainability overheads and help other staff understand that they're dealing with a "major difference" and act accordingly.

    I've not had to deal with executing a package that may do one or more things based on a variable such as company. Is there a generally accepted best practice for executing a package like this?

    Cheers and thanks for helping me think,

    RF

    _____________________________________________________________

    MAXIM 106:
    "To know things well, we must know the details; and as they are almost infinite, our knowledge is always superficial and imperfect."
    Francois De La Rochefoucauld (1613-1680)

  • I think you're on the right track. It's just a matter of working out the details.

    Keep in mind that with SSIS you can use Configurations to setup package parameters at run time, i.e., tailor the package for DB1 vs DB2.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply