Transaction replication - scaling issue

  • I have a probelm.

    I have 100+ databases that I have a business requirement to provide (continuous) transactional replication on. All reside on the same server.

    When I set them up for replication, everything goes fine for databases 1 through 50. Then when you get to 51 it falls apart for those after 50. (This is SQL Server 2005 - Standard Edition (its the same in Enterpise Edition) - 32 bit edition)

    I have already worked with PSS to find that the real issue (this is occuring on the distributor server) is that the logreader agents are consuming all of their shared desktop heap memory. so, I understand that bit.

    But, is transacitonal replication really locked at 50 realisticly usable publishers (unique databases) for a server? (Yes, I know I can raise the limit of allowed agents in the MSDB, but that just hits the same heap limits).

    Am I doing this wrong? BOL doesnt seem to provide any guidance on this area (how many publications you are limited to).

    Is there an alternative way to provide continuous replication at this scale?

    Or is my only choice (to meet this requirement) to spread the databases out across more servers to stay within the apparent limitation?

    (this is for a real time reporting solution that is very CPU intense on the Database - and must not compete for IO/CPU time on the published databases).

  • I have not done it for that many on the same server but I know people who has done it for over 200+. So you are not anywhere near the limit.

    There are a couple of things though. LogReaders don't have to be run at once you could runone after the next after the next and so on. and you could separate the distributor to its own server


    * Noel

  • If you leave a publication set up as it is out of the box (using the wizard), the log reader is created to run in continuous mode. It never dies off, and never frees up its 'slot' (so to speak).

    Moving the publictions to time sliced (scheduled) jobs, you loose continous replication. Which breaks my requirement for real time data replication.

    Or so that is what I am lead to believe so far.

  • Bruce,

    continuos mode is really not that "continuos" if you look in the profile it is actually polling. I am not sure about your "latency" requeriments which is what really counts. If your latency is not that tight there is nothing wrong in doing it that way.


    * Noel

  • Andy Warren, One of the Site Owners has done it in the past:

    http://www.sqlservercentral.com/articles/Replication/noncontinuousreplication/111/[/url]


    * Noel

  • Thanks Noel,

    It looks like I went down the same road as Andy did 6 years prieviously.

    Andy is correct; the scheduling all of this becomes a bear.

    My scale is actually a lot bigger then I previously represented, and scheduling it means I can have anywhere from 10 to 30 minutes lag (assuming I dont have more then the default 500 transactions to move in a single cycle) depending on scheduling and worker thread availablilty.

    For anyone else who has looke dat this thread and wonders the same questions, I am now going to talk with DataMirror to see if they can scale out to this level.

    Thanks again Noel.

  • Bruce,

    If you can consolidate databases you could reduce the number of agents "specially" LogReaders. If you cannot then either you will need a big powerful machine or use a different solution. I have talked to datamirror people in the past and they seem to know their stuff but I am not really sure if they provide that level of concurrency ( latency ) you are after.

    Good Luck 🙂


    * Noel

  • Hi Bruce,

    Don't give up just yet!

    What is the specification of the distribution server, including memory, disk and network?

    Thanks,

    Phillip Cox

  • had a similar issue where db with around 300 tables would get tons of errors in replication

    we fixed it by installing a new distributor running 64 bit SQL and we got to merge two old distributors on it and everything is running fine

    DL 380 G5's are amazingly fast. right now we are running the distributor with 8GB of RAM and hope to max it out at 32GB by next year

  • Phillip,

    Thanks for the upbeat response.

    We haven't given up as such. But more avenues are closing.

    Since you asked, these particular boxes a DUAL CPU / Quad core - 16 gigs ram - plenty of disk space.

  • We spoke with Data Mirror.

    Yes, they undoubtedly know what they are talking about. And it took a couple of senior engineers later to get to the point where they bowed out determining we are far out of the scale of their product.

    There is a functional limitation in the operating system that if you don't engineer your own way around (if you don't know it exists (heck, I didn't)) it, you will hit it trying to do what we are doing.

  • DL380's are indeed sweet boxes. We have a couple DL580's for something we are doing becuase we needed 4 cpu's for bus bandwidth. I can't wait for the day i can really get quad cores in that kind of chassis.

    (disregarding the number of boxes it ships in 🙂 )

    Due to memory needs (at a cost effective point) we moved up to x64 OS and SQL.

  • HP just came out with the DL 580 G4

    4 quad core CPU's, up to 256GB of RAM and something like 16 drive bays

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply