Replication question

Question

Replication question

rockingleo

SSC Enthusiast

Points: 169
More actions
November 24, 2010 at 2:26 pm

#241592

I have to implement transactional replication in my company.
Here is the scene :
I will first implement transaction replication from Publisher p to Subscriber S1 and once i verify everything is working fine I will add the subscriber S2. I am thinking of following sequence of steps . Can you verify I am going right
1) Create distribution and Publisher at machine 1 (P)
2) Create a Subscriber at machine 2 (S1) . Do not turn on Log reader agent.
Here i have a question when does the SQL Server start replication process. Does it just turning on the log reader agent first time or it is first initialization of the subscriber database. I am confused here.
3) I assume now that I have publisher and subscriber but replication is not on (how to make sure replication is not yet started?)
4) Manually, Initialize snapshot and start sync agent . Keep the log reader agent off as the publisher database will notbe updated during this time.
5) Now i have subscriber having same copy as Publisher. I start the log reader agent and schedule it so that it turns of when I am starting distribution agent o avoid any deadlocks. during this time ,Publisher database will not be updated.
6) Once distribution agent is done. again start log reader agent and follow this sequence for few days.
7) check if the replication is working 100% fine using tabledifutility
8) Add another subscriber and follow the above procedure.
I have a Questions such as
1) when i add second subscriber, does those command (adding subscriber S2) also sent to the already existing first Subscriber because transaction replication agent will read all the transaction from the database logs. Same question when i reinitialize the subscription S2 does those commands also sent to S1.
Mainly What step ensures that replication is started first time. Is it log reader agent's first start.
Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

dbamohsin SSCommitted Points: 1948 More actions · Answer 1

Ok several questions above but i will try to disect a little. Just as a side point - why dont you create a test instance(s) and try out your questions?

answer to question 2) If you create a subscriber without initialization then the replication process assumes that you have already prepared the subscription. this means it does not need to be re-initialized. The replication process is assumed as started when you attach a subscription to the publication. The table properties mark it as a replicated table and any transactions which it receives are held in the transaction log until picked up by the Log reader - regardless of when you turn on the log reader

Ive done 2 quick tests on this:

Test 1

Created Distribution and Publisher with 1 article.

Stopped logreader

Updated a row in the table

Turned on log reader - nothing pushed to Distribution DB

The table is shown to be NOT REPLICATED even though the article exists.

Test 2

Created Distribution and Publisher with 1 article.

Stopped logreader

Created subscriber without initialization

Update 1 record in table

Turned on log reader - several undistributed commands appear in Dist DB (the procs which are pushed to the subscriber for the article, and the 1 record I updated)

The table is shown to be REPLICATED even though the article exists.

Answer to Question 3) Not applicable - replicated is assumed when you create a subscriber. Even though you phisically have not turned it on, it has to try to keep integrity somehow - otherwise you would be reinitializing all the time. That somehow is via the log reader as the first point of contact.

Answer to point 5) Confused... Log reader runs against publisher, distribution agent runs against subscriber - deadlocks?

Answer to Point 6) Log reader and Distribution agent are not dependent. Log Reader pushes data into Distribution DB and Distribution Agent retrieves and pushes it on to Subscriber.

Answer to 8) Incorrect. When you add the second subscriber you should initialize S2 at that point.

I would suggest you change your methodology to something similar to Below:

1. Make sure your publication DB is not in use.

2. Create Distributor and Publication

3. Create subscription 1 (S1) with initialization

4. Push Initial snapshot across to S1

5. Start LogReader and Distribution Agent

6. Test it all works and data is pushing across

7. Bring Publication DB back online to general users

8. On a seperate occation,

9. Make sure your publication DB is not in use.

10. Make sure no transactions waiting to replicate

11. Create subscription 2 (S2)

12. Push Initial snapshot across to S2

13. Start LogReader and Distribution Agent

14. Test it all works and data is pushing across

15. Bring Publication DB back online to general users

Let me know how you get on/...

My DBA Ramblings - SQL Server | Oracle | MySQL | MongoDB | Access[/url]

rockingleo SSC Enthusiast Points: 169 More actions · Answer 2

thanks for such a detailed reply.

I agree with your sequence to steps.

for point 5. i was thinking may be some time log reader and ditribution may deadlock at distribuiton database as log reader will try to insert data into it and distribution agent will try to read from it. they may deadlock at same record. not sure if it ispossible..

Is it good to keep log reader agent always running?On My publisher server , we run daily a batch process for 7-8 hours. if i run log reader it may slow it down by few hours as batch proces is very intense and uses 100% CPU. Apart from batch procress there is no update to Publisher database.it remains static for remaining 16 hours of 24 hour period.

I was thinking of two cases ( distribution is also local database)

Scenarion

1) stop log reader, stop distribuiton agent, run batch process in night 1 AM .it finshies by 7 am.

2) next morning start log reader at 7 and stop it at 2 PM. start distribuiton agent at 2 and stop it at 10 PM

OR

2) Next morning start both log reader and distribuiton agent and stop them at 8 PM or 10 PM.

I dont want any agents to run during batch process as it is the most important thing we do . All other stuff is seconday.

It should finish cleanly.

Thanks

dbamohsin SSCommitted Points: 1948 More actions · Answer 3

ok how big is your database? are you going to be replicating everything or a sub section of the database?

I think you are better using Snapshot replication based on your scenario. If the database is running similar to a data warehouse, then just add the running of the snapshot onto the end of your batch processing job.

No point having a log reader 🙂

My DBA Ramblings - SQL Server | Oracle | MySQL | MongoDB | Access[/url]

rockingleo SSC Enthusiast Points: 169 More actions · Answer 4

Snapsot is out of scope because database size is in terabytes so we have to have transactional replication. there are going to be 3 subscriber in future starting with one.

Thanks

dbamohsin SSCommitted Points: 1948 More actions · Answer 5

In that case, your suggestion about scheduling the log reader would make sense.

Ensure you consider the following

1) The size of the transaction log during the batch - it will not shrink even if you do transaction log backups as there will be transactions waiting to be pushed to distribution.

2) the backlog when you start the log reader after the batch.

3) When the data needs to be available at the subscriber. Time = time_from_log_to_Dist + time_from_dist_to_subs

In answer to your original query about deadlocking - the log reader is not concerned once a command has been delivered to the distribution database (it will never access it again) hence a deadlock is very very unlikely as row lock contention will be practically impossible.

My DBA Ramblings - SQL Server | Oracle | MySQL | MongoDB | Access[/url]

rockingleo SSC Enthusiast Points: 169 More actions · Answer 6

Hi thanks a lot for brining those points.

1) The size of the transaction log during the batch - it will not shrink even if you do transaction log backups as there will be transactions waiting to be pushed to distribution.

We implement simple recovery model in the publisher database but it should not be a concern since we will run log reader agent everyday. and I think once transactions from logs are read by log reader agent those transactions can be shrnked. I am not aware if in simple loging also SQl server will shrink the logs. I think it just overwrite over the previous logs.So that shud be fine since rplication marked logs will read the same day.and we have around 1TB for logs on publisher. what is your say on this ?

2) the backlog when you start the log reader after the batch. i think backlog will be around 20 GB as that is the amount of data for everyday batch process.i think it shud be fine . what is ur opinion?

3) When the data needs to be available at the subscriber. Time = time_from_log_to_Dist + time_from_dist_to_subs

We can wait for a week . Actually distributor will be run only weekly as other subscribers also have their own batch processes.

I have one or two more concerns again

1) since first snapshot will push around 1 TB of data. Is it fine to use snapshop process for initialization . I hope their will not be any timeouts .or do i need to set any paramets (over default) for pushing such a huge data using snapshot folder technique.

2) since Distributor will be storing whoe week transactions. Do i again need to set any parameters so that this data is usable after one week or when sent to subscriber ( i mean is there their any expiring thing or time out concept for data in distributor)

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 7

rockingleo (12/1/2010)
1) since first snapshot will push around 1 TB of data. Is it fine to use snapshop process for initialization . I hope their will not be any timeouts .or do i need to set any paramets (over default) for pushing such a huge data using snapshot folder technique.

Time will depend on the network and disk speed. I've pushed 4TB dbs around before in this method and it takes a while, but doesn't timeout. When something goes wrong at the end though you'd like to hurt someone... XD

2) since Distributor will be storing whoe week transactions. Do i again need to set any parameters so that this data is usable after one week or when sent to subscriber ( i mean is there their any expiring thing or time out concept for data in distributor)

There shouldn't be a timeout, but you need to make sure none go missing. Make sure there's no file cleanups that might screw you up in the midterm.

Since you're using such long times in between updates of the replication, though, why not just go to log shipping? Less overhead and pain.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA