SQL Server 2008 - Server Architecture DWH

  • Hi there,

    I have an architecutral Question:

    for our DWH wie got 2 SQL Servers (Enterprise Edt.) and our former admin/architect constructed following:

    1. Server: DWH, Reporting, Cubes

    this server is hosting the dwh and is used as reporting-frontend and cube-host (excel as frontend)

    2. Server: ETL. This Server is doing all the ETLs. Some Procedural-Tables just used in the ETLs are hosted there, everything else is on the dwh.

    The two systems are quite similar

    both are Dell PowerEdge R900 with 4x xeon-6-Core and 128GB RAM

    the DWH has a total DB-Space 3,5 TB RAID10

    and additionally backup-place 9TB Raid6

    the ETL has 700GB DB-Space RAID 10

    and 5 TB as archive for incoming file (access-logs, csv-exports from other servers, etc.)

    I would like to know if this is a standard architecture or if there is something preferable out there and we properly should reorganise the whole structure.

    For example, you could say, we may put the databases of the dwh on the etl machine and then use former the "DWH" system only for the frontends (Reporting Services and Cubes) which will access the databases on the "ETL"-Server.

    or... what ever would be best

    Cheers,

    Mitch

  • Mitch,

    It seems like a good first crack at it.

    The big resource hogs will be ETL and cube processing sucking up disk and RAM.

    Reporting can be a big load, but if people are using the cubes and you have good design and indexes that can be mitigated.

    It would be wise to keep cube processing and ETL seperated.

    I don't have any idea how much data you have, so keep that in mind as you read my advice.

    I only know how much disk you have.

    If you have a small enough amount of data you could put both on 1 system as long as you have enough RAM and fast enough (and seperate) disk as to not create resource constraints.

    Cheers!

  • Here I have some details for the databases:

    Main-DB is the DWH-DB

    with about 130 Million User. About 10 tables with non historizied data (so 130 Million rows) and 10 historized tables with between 150 Million up to 500 Million data rows. (all partitioned of course)

    Additionally we have some databases which hold aggregated data but there size is not relevant 😉

    Our classic ETL has been to get a logfile process it and put the information into the DWH.

    Or use data from the DWH aggregate it and put it into another table.

    However we now get more more into the situation that we have an external logfile and while we process it, we have to add data from the DWH and then aggregate it.

    Processing those logfiles on the dwh would be quite fast as the additional data is already there. However we would like to keep the DWH free from unnessesary queries.

    Our solution so far is to process with the ETL, move the relevant user_ids in a temp-table on the dwh. Select the additional data with this temp-table and then put it in a temp-table on the ETL-Server AND then aggregate the information.

    As we just have attached drives I guess there is no other way to access data of the dwh besides of SSIS-Datasource or Linked Servers?

  • For a DWH, I would not put Analysis Services (Cubes) and the Datawarehouse on the same server. I'd go with 3 servers here.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • The question is: are there any performance issues with what you currently have? Are there any resources left over with what you have? You need to analyse how the system currently fulfills its requirements. If it falls down anywhere or if you can see free resources anywhere then yes by all means alter it.

    For example, if you monitor disk space on both systems and find that one server is running out and one server has loads of free space then it makes sense to move some disk over. Same for memory etc.

    Also consider that ETL and cube builds usually happen overnight and adhoc relational queries happen during the day. So it might make sense to move some resources from your ETL server to your DW server, then overnight run your ETL on both servers.

    That way during the day you have more capacity on your DW server and during the night you are using both servers for ETL and you have maximum resource utilisation. You don't have resources on the ETL server sitting there idling during the day.

    But really you need to monitor resources usage on your existing system and identify any trends or resource pressure and reconfigure accordingly. Not only that you should estimate what is going to happen 6 months from now with your expected growth in data volume and reconfigure before any issues occur.

  • I think what nick posted makes a lot of sense, but I think it is really more geared toward daily management of the system. Before you start making changes for today, figure out what you will need for tomorrow and then revisit this on a regular basis. We do capacity planning for our DWH's twice a year. Every 6 months, we plan for the growth for the coming year. I know it's a cliche, but it's true. Failing to plan is planning to fail.

    Definitely, you need to do what nick described, but don't rely on that alone. It may be difficult to determine intially how much disk space or RAM or CPU you will need in the beginning. But over time, you should be tracking CPU and memory usage and your data growth and be able to estimate with certainty how capacity you will need to add over the following year.

    I work with a very large DWH at my job. Here is what we do:

    Staging server: Clustered server with lots of CPU's and RAM. We have a staging SQL Server that has ETL running almost contiuously throuthout the day pulling data incrementally from source systems in small chunks. The data is stored and transformed on this server.

    Primary warehouse: Clustered server with lots of CPU's and RAM. This is the main data warehouse that contains data going back 6 years from our application. ETL processes running on the staging server populates the data on this server.

    Ad hoc warehouse: Stand-alone server with lots of CPU's and RAM. This server has a copy of the data from the primary warehouse limited to 18 months of data and is populated by ETL processes on the staging server. This server is dedicated to a few streams of business that needs to perform ad hoc analysis of the data.

    Downstream warehouse: Stand-alone server with lots of CPU's and RAM. This server has a copy of the data from the primary warehouse limited to 3 years of data and is populated by ETL processes on the staging server. This server is dedicated to streams of business that needs to import our data into their own applications and DWH's.

    2 Cube servers: 2 stand-alone servers with lots of CPU's and RAM, double what the other servers have. The servers are load balanced. The cubes are processed once per day. The process for this is to remove cube server #1 from NLB, process it, and add it back into NLB while removing cube #2 from NLB. Then Cube #2 is processed and added back into NLB. The cube servers are used by our reporting services site and a corporate reporting services site.

    All-in-all, the process to update all of the DWH's and the cubes takes about 7 hours and runs over night. We have a operations team of 3 located in India so that we are fully staffed in case there are any problems with the nightly processes. Also, the engineering team is in India, so all major work done for this occurs during their normal work day which is the middle of the night here where most of the user's are.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Thank you guys.

    I will try it out.

    Currently we don't have performance issues at all, at those systems are quite new (6 month) and we still have to migrate a lot of jobs from an older/smaller SQL Server 2005 were everythink was just on one machine (4x CPU, 16GB RAM, 2 TB DB-Space).

    So as we are just in the beginning of the process I just wanted to make sure, that the architectural concept was not total bullshit. But it seems to be OK.

    I will get some monitoring for those systems.

    Cheers,

    Mitch

  • hehe there is a lot of bullshit around thats for sure 🙂

    It appears sound but if course with that much resource headroom you won't know if its an issue until your data volumes start growing.

    Its important to implementing some performance monitoring now so you can get a baseline and monitor usage and growth

  • Do you have any recommendations for Monitoring? or just the Windows Perfmon?

    If Perfmon any recons on the counters I should have a look at?

  • Sorry I can't but I know there is a good article on performance monitoring on this very site. Maybe someone else can post it as I don't have the URL - I just see references to it a lot.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply