Federated Databases In Data Warehousing

  • I posted this thread originally under Analysis Services.  Thought I might have better luck getting a response with the group that frequents this section (you all being so brilliant!). 

    I am considering a design for a data warehouse that will use distributed partitioned views over federated SQL Server 2000 servers. 

    BOL says "This implementation is designed for high-end OLTP and Web sites with individual SQL statements retrieving minimal data as compared to the decision support, Analysis Services (formerly OLAP Services). "

    Because the ETL process for this DWS will involve extraction from source data in several different time zones a Federated system seemed to me to be an optimal solution. 

    Have any of you had any experience in using distributed partitioned views with Analysis Services with a federation of servers?

    What are the pros and cons?

    Thanks for your insight!

  • Admittedly, I've not done much with Federated Servers but I'm not sure why the relatively simple process of ETL would justify the need even in a world wide setting.  The number of daily hits from the WEB may.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What column would you be partitioning on ? How have you determined that it's even necessary to partition ?

     

  • I would be partitioning  fact tables the largest of which will have 1 to 3 hundred million rows and will likely grow at a rate of a 100 million rows per year.  The partitions will be done based on Time Zones (we're currently dealing with 3 different time zones).  

    The window of time to perform the extractions differs based primarily on Time Zone.  Therefore, I am trying to avoid contention while performing the ETL from West Coast Systems while East Coast users are querying the system either through applications that use MDX, T-SQL queries from decision support applications that use the Data Warehouse, or Pivot Table reports running against the OLAP server.  

    Some users only need to query data in their time zones, while others need to see the data as a whole.

    A Federated group of servers would allow me the horizontal partitions needed to get the data down to a manageable state for querying, (e.g., The central and western servers would not be used to query data for New York) and provide additional processing capability to perform the very intensive ETL.

     

     

     

     

  • Personaly I had the oportunity to work with a billion row (1000000000) database tables before. Yes Partitioning was used but on the SAME Server!

     Becareful with splitting that into multiple servers and make sure you really have a network backplane that support the sheer amount of Data When quering is going on!

    If you are absolutely positive that no OVERLAP is going to occur you can go for the distributed but if the servers are on different locations I would leave them as they are and consolidate the info into the "master" one with SAN like storage in it

    Just my $0.02

     


    * Noel

  • Thanks, noeld.  The servers will be connected by GB ethernet and will be at the same location.  A SAN will be used for disk storage. 

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply