Replication Architecture Questions

  • Hello everyone,

    I am looking to use SQL Server Replication for a large hosted reporting solution, and was wondering if you could help me decide on a push or pull subscription method.

    My hosted solution, where all the "replicated" databases+data will be stored, will have one instance of SQL Server 2005 running, as well as my webserver running a web application for connecting to these databases and allowing the clients to run reports on their data in near real-time. Ideally these will be on separate machines, and they will probably both be quad processors or better, minimum of 4GB RAM. Any bandwidth restrictions will likely come from our customers' end, as the hosted solution bandwidth is pretty hefty.

    We have several clients running the same application with the same database schema on 30+ sites, with more coming to probably bring that number up to around 100 in the next 1-2 years.

    What we want to do is create a "mirrored" instance of their database, with less emphasis on real-time, up to the minute synchronicity, and more emphasis on a stable application on the customer sites, with a readily accessible reporting solution on this hosted server.

    So the hosted solution will have one database per customer, and the data will need to be replicated on a schedule. Probably once every 2 hours.

    So my question is, given the above factors, is it feasible to push the data from the customer servers without much of a performance hit, in near real-time? (This customer application on their network needs to have maximum uptime and performance, as they use it for daily operations, such as taking payments during regular business hours, data entry, processing, etc.)

    Or should we plan to schedule a "pull" type replication every 2 hours or so, from our hosted server, grabbing each customers data every 2 hours, in a serialized fashion, so as not to degrade the performance of the replication instance...?

    Any suggestions, opinions, or ridicule is welcome.

    Thanks in advance!

    Mike

  • Hi Mike,

    If I were you, I'd set up continuous push replication. My reasons:

    Continuous: If you set up a scheduled replication, then from time to time it starts up, hitting your server with the startup istelf and after it the data burst (just think about a shop with 1000 customers per a day who come uniformly distributed and another where the daily 1000 customers come in 2 blocks). So this way you'll have smaller overhead and extra load on the OLTP databases.

    Push: The data sources are your customers and your priority #1 is to ensure application availability, replication is on the 2nd place only. With push replication you give the control to your customers and in case they have an extra load on the db, they have the option to pause the replication immediately on their own. In addition, it's easier to set up a push replication in a subnetted environment, since you already need use a single TCP port where SQL server listens. Obviously a disadvantage that the workload will be on the OLTP servers, but it's not so huge (and otherwise, you'd end up with a SQL server handling 100 pull subscriptions and serving reporting queries).

    Anyway, the best solution would be if you could set up a dedicated distributor server which could work for all of the customer databases but it requires a high network bandwith between the customer SQL servers and the distributor.

    And a small advice: if your reporting server is SQL2005 then set up the application to use snapshot transaction isolation level, otherwise long-running queries will block replication

  • Thanks for your response Erik!

    A couple of clarifications...

    1-We have decided to go with PUSH replication, but are still working on proof of concept, as our customers will often be behind firewalls, so we need to be sure we can replicate across the Internet.

    2-RE: dedicated distributor...do you mean we should have each separate customer site push data to a third server, and in turn let the report server subscribe to it?

    The customer servers (upon further thought) will now most likely include separate SQL Servers; 1 for application + 1 replicated for reporting. Would it be useful to have a third p2p replicated (local) SQL Server on each customer site that can utilize high local bandwidth and then push the data back to our hosted reporting server from there?

    It seems like this would be the best way to avoid performance hits on the application AND local reporting (the customers will still have the ability to run some reports on the local network as well, but these won't be AdHoc)

    Finally, as we've not attempted to set this up yet, does replication require separate instances if run on the same Server, or can we replicate to an identical schema/different database name?

    Thanks again for all your input and assistance! It is MUCH appreciated!

    Cheers!

    Mike

  • Hi Mike,

    1-Firewalls are a very good arguments for push replication. This case you need to open only one port: your SQL server for your customers. First, this way you risk your own network not the customers' ; second, in case of pull replication you would open additional ports (either for FTP or for SMB).

    2. IMHO the existing reporting servers would be very good replication sources for you, however, this scenario would make your life hard when the schema changes.

    3. You can use a single server with separate databases for this purpose. Technically, you could use a single database with modified table names as well, but it'd be perversion...

    Hope it helps,

  • Erik,

    That's all wonderful information. Thanks so much, again, for your time and assistance!

    Mike

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply