May 13, 2010 at 3:18 pm
Hello,
We are building a new DWH with sql server 2008 R2, I have 2 physical servers for the mission, 1 of the server has 8GB of ram and 1 quad core processor and the second one has 16GB of ram and 2 quad core processors, the storage system is IBM's XIV. The network speed is 1GB. The estimated DWH size after 1 year will be around 1.3 TB (without indexes)
We will have 3 different areas: source, staging and DWH. Some of the processing will be done in the staging area (joining to the dimension tables and other calculation) and some processing will be at the DWH area, inserts, queries and reading data for OLAP cube.
In order to get the best performance possible I thought to connect the 2 servers using linked server , and used the 8GB server for source and staging areas and the 16GB server for the DWH area, but after I read some posts about linked server it I come to think that maybe itβs not such a good idea (due to network issues, copying large amount of data between the 2 servers etc...), so maybe the better solution is to put all the areas in 1 server in different database?
What is your opinion in this issue?
Thanks, Dave.
May 13, 2010 at 3:25 pm
My preference is to go with dimensional modeling.
In a datawarehouse environment design is the key, in a small database no matter how bad was it designed you can always use brute force to get performance - in a datawarehouse you will only get performance if the design is sound.
Please do not take it the wrong way but you may want to consider hiring a datawarehouse architect with proven record of timely, quality delivery - that will save a lot of money and sleepless nights.
Hope this helps.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.May 13, 2010 at 3:31 pm
dimensional modeling can be achieved in both of the implementation method i described, the question is which method will perform faster.
May 13, 2010 at 3:42 pm
dave_le (5/13/2010)
dimensional modeling can be achieved in both of the implementation method i described, the question is which method will perform faster.
Let me simplify it a little and say that in a datawarehouse you have three kind of tables...
1) Staging Layer - STG - staging area
2) Core Layer - FACT/DIM - datawarehouse core
3) Delivery Layer - Cubes, Aggregation tables, whatever.
On the same note there are three kind of processes...
1) ETL - Moving data from operational databases to STG then loading into FACT/DIM
2) Creation/refresh of delivery layer objects like building cubes, aggregation tables.
3) Quering.
In general my personal rules are...
- no operational database sits on datawarehouse host.
- STG/FACT/DIM/Etc sit in a single database
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.May 13, 2010 at 3:48 pm
It really depends on a lot of things:
1) What disc hardware you have
2) What kinds of bandwidth you have between the two servers
3) How much processing is involved to WH your data
4) How much strain is going to be put on your WH
5) How long it takes to process your data versus user expectation of WH performance
6) etc
I think you're just going to have to test it and see. I generally wouldn't pre-stage on a separate machine unless I needed to for performance reasons. If you try to do it all on one machine and it's too pegged to suit the needs of your users then you would want to offload that processing to another machine (after optimizing the processing as much as possible.)
edit:
I'm not sure if your 8GB machine is a regular (OLTP/ODS) db environment or an extra 'pre-staging' machine for your WH environment. I would definitely keep the OLAP separate from your OLTP/ODS as Paul stated.
May 14, 2010 at 5:32 am
the 8GB machine is not the OLTP system, it's a server that will be used for loading source data and staging.
May 14, 2010 at 5:42 am
dave_le (5/14/2010)
the 8GB machine is not the OLTP system, it's a server that will be used for loading source data and staging.
I would certainly use a second server to handle ETL processes IF we are talking about a near-real-time DWH, if we are talking about a next-day DWH most probably DWH server will be able to handle daily ETL window. The beauty of next-day DWH ETL is that usually ETL window happens at a time where users are sleeping therefore ETL and quering do not compete for resources.
Just my personal take on the issue. π
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply