How to Build a robust DWH ? need some advice

  • Hello,

    We are building a new DWH with sql server 2008 R2, I have 2 physical servers for the mission, 1 of the server has 8GB of ram and 1 quad core processor and the second one has 16GB of ram and 2 quad core processors, the storage system is IBM's XIV. The network speed is 1GB. The estimated DWH size after 1 year will be around 1.3 TB (without indexes)

    We will have 3 different areas: source, staging and DWH. Some of the processing will be done in the staging area (joining to the dimension tables and other calculation) and some processing will be at the DWH area, inserts, queries and reading data for OLAP cube.

    In order to get the best performance possible I thought to connect the 2 servers using linked server , and used the 8GB server for source and staging areas and the 16GB server for the DWH area, but after I read some posts about linked server it I come to think that maybe it’s not such a good idea (due to network issues, copying large amount of data between the 2 servers etc...), so maybe the better solution is to put all the areas in 1 server in different database?

    What is your opinion in this issue?

    Thanks, Dave.

  • My preference is to go with dimensional modeling.

    In a datawarehouse environment design is the key, in a small database no matter how bad was it designed you can always use brute force to get performance - in a datawarehouse you will only get performance if the design is sound.

    Please do not take it the wrong way but you may want to consider hiring a datawarehouse architect with proven record of timely, quality delivery - that will save a lot of money and sleepless nights.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • dimensional modeling can be achieved in both of the implementation method i described, the question is which method will perform faster.

  • dave_le (5/13/2010)


    dimensional modeling can be achieved in both of the implementation method i described, the question is which method will perform faster.

    Let me simplify it a little and say that in a datawarehouse you have three kind of tables...

    1) Staging Layer - STG - staging area

    2) Core Layer - FACT/DIM - datawarehouse core

    3) Delivery Layer - Cubes, Aggregation tables, whatever.

    On the same note there are three kind of processes...

    1) ETL - Moving data from operational databases to STG then loading into FACT/DIM

    2) Creation/refresh of delivery layer objects like building cubes, aggregation tables.

    3) Quering.

    In general my personal rules are...

    - no operational database sits on datawarehouse host.

    - STG/FACT/DIM/Etc sit in a single database

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • It really depends on a lot of things:

    1) What disc hardware you have

    2) What kinds of bandwidth you have between the two servers

    3) How much processing is involved to WH your data

    4) How much strain is going to be put on your WH

    5) How long it takes to process your data versus user expectation of WH performance

    6) etc

    I think you're just going to have to test it and see. I generally wouldn't pre-stage on a separate machine unless I needed to for performance reasons. If you try to do it all on one machine and it's too pegged to suit the needs of your users then you would want to offload that processing to another machine (after optimizing the processing as much as possible.)

    edit:

    I'm not sure if your 8GB machine is a regular (OLTP/ODS) db environment or an extra 'pre-staging' machine for your WH environment. I would definitely keep the OLAP separate from your OLTP/ODS as Paul stated.

    β””> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • the 8GB machine is not the OLTP system, it's a server that will be used for loading source data and staging.

  • dave_le (5/14/2010)


    the 8GB machine is not the OLTP system, it's a server that will be used for loading source data and staging.

    I would certainly use a second server to handle ETL processes IF we are talking about a near-real-time DWH, if we are talking about a next-day DWH most probably DWH server will be able to handle daily ETL window. The beauty of next-day DWH ETL is that usually ETL window happens at a time where users are sleeping therefore ETL and quering do not compete for resources.

    Just my personal take on the issue. πŸ˜‰

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply