The easiest "higher" availability option for a production datawarehouse environment?

  • Hi all,

    I am working in a start up, and I am the one and only BI / DW architect here. I however have developed 2-3 OLAP cubes, 8-9 fact tables with 30-40 dimensions, along with a close to real time relational reporting view (SSIS ETL for the relational view every 10 mins on the latest metrics)

    So obviously we are short on staff here, and it's not likely to change. We also only had 1 DW server and 1 OLAP server before, which sucks coz the whole thing goes down when anything is wrong.

    Luckily, we are upgraind hardware now, and we are definitely order 2 servers for everything (2 DW servers, 2 OLAP servers, 2 lightweight real time reporting layer, etc).

    And I am considering various options to make the environment as a whole relatively highly available. Note that even though the OLAP is production, if it goes down for a little, it's not the end of the world.

    So I am pretty much sure I DON"T want to go with active / passive clustering, just because of the maintenance overhead. I am not sure if it's worth it, plus if some data related issues crop up, the whole thing is still unusable.

    One option is mirroring / replication, but I haven't done much about those before, so I am not sure what it gives me, in terms of availability vs ease of maintenance (the objective is that the easier for me to maintain, the better)

    The last option, but relatively attractive to me, is to just set up all the servers independently and in parallel. The OLAP / DW is ONLY updated by ETL scripts, so deployment is a little more work (new scripts has to be deployed twice). But it gives me the flexibility to have 2 parallel environment to play with. If one server goes down, I can easily restore to the other server, while fixing the downed server.

    If they are both up, I have the flexibility to just use one of the 2 servers via simple router setup.

    What are the thoughts? pros and cons??

    thanks for any advise!

  • I would use mirroring. It's easier to maintain for sure. A little more complex to understand, but not what I'd call really complex. It will keep things in sync for you between the relational databases.

    If you want delays between the boxes, like say for an hour or two to catch data errors, then log shipping is a better choice.

    Books online has some good information about how these all work: http://msdn.microsoft.com/en-us/library/ms190202%28v=SQL.100%29.aspx

    Replication is nice, and has some good advantages, but it takes a lot more setup from the DBA (Each table essentially is moved), and it can be harder to maintain troubleshoot if things go wrong.

    Both are relatively self-maintaining for the most part, but things can get stuck at times.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply