Lots of replicated tables - how many publications?

  • I have found all kinds of 'nuts and bolts' information about Replication, but am having problems finding good advice for design.

    I am replicating 78 tables from a master database for a scale-out solution.  My dilemma relates to best design practices – I have found only one article about design considerations for publications and this guy recommended creating multiple publications within a single database publisher.  So, at this point, I have 78 different publications, each with only a single article (an individual table).  Is this good practice?  Will performance be enhanced by the parallelism of individual publications for each table article, or will it be hurt by so many publications?  Should I try to reduce the number of publications or do you think it’s fine to have one for each table?

    Each publication has been set to have it’s own independent distribution agent – I was thinking that this would keep the replicated data more concurrent in the subscribers databases (I have the polling frequency for log and dist. agents set to one second).  We don’t have huge updates from clients, just steady transactions.

    Any and all advice you could offer about optimizing these 78 table publications would be greatly appreciated.

     I am in also in the process of adding more security for my agents and read that I should create local windows accounts for them to use instead of impersonating SqlServerAgent’s account.  Another question that I cannot seem to find an answer for is whether I should create a different account for each of the 78 distribution agents or if they can (and should) all share the same windows login without any problems ?

     Thanks again for any light you can shed on these design and configuration questions, I really appreciate it.

     

  • I've not worked with that many tables before, but perhaps you could consider the following,

    1. Group logical tables togther within a publication - if you have related tables then this may actually be a necessity.

    2. Separate out tables that are one way - lookup style tables where no updates will occur.

    Don't know if these suggestions fit with your data structure, but it is certainly how I would begin to break it down.


    Regards,

    Steve

    Life without beer is no life at all

    All beer is good, some beers are just better than others

  • We have a T-Rep in our system and over 300 tables being replicated under one publication. Our DB size is around 160 GB. And we have no issues as such. So if its for performance reasons you want to create so many publications I would not recommend so. Also having so many distribution agents also is a huge load on the server. Replicating 78 tables is not a big deal for a properly equipped (in terms of hardware and memory) SQL Server.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Thanks so much for your replies!  The advice I got on creating so many publications said that this would achieve parallel data streams for updates and increase replication throughput (but definitely this has a cost). 

    I agree with your comment about high ovehead for so many distribution agents - which is why I started searching for the advice about best practices.  I guess like everything it's a question of balance - but I am not really sure of all the factors and how much each tips the scale.

    I will look at the logical grouping suggestion further - one question I have is whether grouping articles together that are likely to need replication at the same time is good or not good.  Will this increase the likelihood of queing?  Or, if I make sure that related tables are in different publications would this help maximize parallel streams (while at the same time reducing dist. agent counts?)

    Thanks for the discussion, it's not easy to find information on replication design tuning and I appreciate the opportunity to bounce these ideas around.

    Cheers, Normajean

  • The grouping idea was purely off the top of my head and not something I've actually tried, but it appears from what Dinikar had to say, you shouldn't need to do this.

    My idea would have been to include groupings in individual publications. Kind of depends on how you handle synchronisations. It was just an idea to split highly transactional data from that which is more static. Probably sync static data at less frequent intervals than transactional data.

    But that may not matter, as with static data, if nothing has changed, there is no information to sync. Management of a single publication would be easier than multiples I imagine. And a single publication would guarantee updates to the subscriber in a more timely fashion.


    Regards,

    Steve

    Life without beer is no life at all

    All beer is good, some beers are just better than others

  • Steve, your grouping idea was a good one... off the top of your head, or not.  If you have built your database using a design tool, like ER/Win, you may have created your model using "subject areas" where all of one type of data was in one drawing... for instance, "customer data" might be represented by a handful of tables, and when you are replicating customer data, you would probably want to replicate it all at once.  Certainly if there is referential integrity between the tables you would want to include them all.

    There was also a question asked about "accounts" to use, and whether he/she should use 78 different accounts... I didn't see anyone respond; hopefully, they figured out ABSOLUTELY NOT!  use one account.

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Breaking into a handful or fewer publications is a good idea. I typically break them up by size - really big tables often go into a pub by themselves, all the little lookup tables and such into a different one. This lets you easily snapshot the little tables if needed and doing the big tables is no worse than it would have been the other way. If nothing was going to change (unlikely!) I'd say put all in one pub and leave it at that. I wouldnt sweat the performance implications, focus on security and administrative needs first, then you can tweak the agent performance if needed.

  • I would only add to Andy's comment about performance in that you should definitley try tp understand performance implication prior to any rollout.

    Our experiences have shown that a non-performant solution will cause issues for your clients and affect their perception of the success of the solution.

    Also, performance is not just related to the agent, but how you set up the filtering scenarios and how the data is "partitioned" for subscribers. It is far more effective to deal with these issues earlier rather than later.


    Regards,

    Steve

    Life without beer is no life at all

    All beer is good, some beers are just better than others

  • I would really like to be proactive in terms of performance - and would like to learn more about filtering and partitioning and identify what other kind of strategies I need to look at.  Do you have any advice on a book that I could use to educate myself?  I know that most all information is in BOL - but it always seems like you have to know what you're looking for or you get wound up in a maze.  Often I end up with more questions than I started with because I don't yet have the big picture overview so can get even more disoriented. 

  • I don't know of any books that talk about this - not that I've ever looked. We have tended to go by what we understand the documentation is telling us and what we have experienced so far.

    It really comes down to what the end users of the system require in terms of data as to how you would partition it.

    We have previously tended to use flags within the data to set up our filtering partitions and then changed those flags to get data removed from the subscriber. As we primarily work with merge replication to handheld SQL CE/Mobile devices, space is a concern.

    We found with SQL 2K5, that we could have pre-computed partitions and I can assure that these are significantly faster than anything we have previously come across.

    But, they come at a cost. We can no longer change the filter flags on the device as any data used in filtering is sacrosanct at the subscriber.

    So we have had to change our model to make the publisher shift data between partitions. This means that the device application has to be smarter about what data to display and there is potential for data to get "out of partition" which raises conflicts.

    Having said that, the change to pre-computed partitions took us from a non-performant solution on a reasonably grunty server to a solution (at least performance wise) that causes no problems and barely taxes the server. Not to mention the speed of the synchronisation process is significantly improved - much to the delight of the users who can potentially be synchronising every 15 minutes.

    Hope that helps you rather than confuses you.


    Regards,

    Steve

    Life without beer is no life at all

    All beer is good, some beers are just better than others

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply