SQL replication - is there a way to start from current data?

Question

SQL replication - is there a way to start from current data?

cphite

SSCarpal Tunnel

Points: 4278
More actions
July 12, 2023 at 8:55 pm

#4233555

Related to a question I posted a while back...
We have a very large inventory database that we replicate from one server to another. Some of the tables have over 150 million rows. Recently, we ran into an error that caused replication to fail, and it had to be restarted. This took a very, very long time (nearly 20 hours) which meant that our sales processing was basically down for a whole afternoon and part of the next day.
The failure happened because we were trying to add a column to one of the replicated tables. Someone accidentally ran the script on the Destination server - which added the column - and then when they realized their mistake, ran the same script on the Source server. Replication tried to add the new column on Destination - and since the column was already there, it of course died gloriously in a rain of errors.
I believe (but am not certain) that under normal circumstances, if I pause replication on Source, take a backup from Source, and then restore that backup to Destination; that replication will continue as normal. It shouldn't even "know" that anything has happened; it should just notice when a new change takes place on Source, and replicate that over to Destination. As long as I am sure that no changes happened on Source in the meantime, all should be well... I think.
But in the case where it errors out... is there a way to do the same thing? Can I tell it to restart without creating a new snapshot? Our thinking is, it only takes a few hours to do the backup and restore - which still isn't ideal - but it beats the heck out of waiting 20 hours for replication to catch up again.
We would - again - of course need to make sure nobody makes changes in the meantime, but we can do that.
July 14, 2023 at 8:24 am

This was removed by the editor as SPAM

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

CreateIndexNonclustered SSChasing Mays Points: 648 More actions · Answer 1

I am only familiar with transational replication in detail, but in transactional, the updates are tracked in the distribution database, not in the published database. You would still have to re-initialize.

ratbak Ten Centuries Points: 1358 More actions · Answer 2

https://www.sqlshack.com/sql-server-transactional-replication-how-to-reinitialize-a-subscription-using-a-sql-server-database-backup/

Michael L John One Orange Chip Points: 27055 More actions · Answer 3

Initialize from a backup, which is usually far faster than a snapshot.

On an semi-related note. How many publications do you have? Have you considered creating multiple publications?

Such as:

Static tables, things such as lookup tables
Users and permissions
Transaction table(s).
And so forth.

Or, break them into equally sized publications, but beware. You do not want one very "busy" publication and others that do nothing.

By creating multiple publications, you can correct an issue that you described far easier and faster. You can also have multiple subscribers that only need specific data. The operational folks subscribe to the operational data publications, the financial folks subscribe to the financial data publications.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

bpflanzer SSC-Addicted Points: 448 More actions · Answer 4

If you're confident that your subscriber has the data you need, you can define your subscription using

@sync_type = N'replication support only' instead of @sync_type = N'automatic'.

and run your snapshot (it won't actually snap the data to this subscriber.)

Once replication is flowing, you can change @sync_type back to N'automatic'

"update [mydb].dbo.syssubscriptions Set sync_type = 1 where sync_type = 2"

bpflanzer SSC-Addicted Points: 448 More actions · Answer 5

also @sphite

when replication failed for the reason you status above "Someone accidentally ran the script on the Destination server - which added the column" you should have been able to simply drop the column on the target and restart replication.. it should pick up from there (adding the column on destination, then continuing to replicate your data)

cphite SSCarpal Tunnel Points: 4278 More actions · Answer 6

Appreciate the help... the @sync_type suggestion from bpflanzer seems to work exactly how I'd hoped.

I am bit nervous about the fact that the first time I tried, I setup a new database, new replication, and made it break; then did the backup/restore as before, and tried to recreate the replication using the script - and I got an error that some command couldn't be committed to the log and had to be rolled back. I ended up having to delete the entire publication and start over. However, the second time through everything worked like a charm - I'm not sure what I did differently.

I am also thinking of splitting into multiple publications. The problem is that the vast majority of the data is in one table... so even if I made that it's own, it's still huge. Was wondering - can I have multiple publications for the same table using filters? Of the 150 million rows we have in that table, about 15% represent parts that we actually sell, and the other 85% are parts that we merely maintain for our partners. If I could have a Publication that only replicates the "selling" parts - and a separate Publication for the "not-selling" parts, I don't particularly care how long the latter one takes.

EdVassie SSC Guru Points: 60472 More actions · Answer 7

One of the nice things about replication is that you can turn off just about all errors on a case by case basis. It should therefore be possible to get replication to skip the schema error and allow replication to proceed. Once everything is running OK then remove the exception condition.

If you can, look at moving to Availability Groups. I have worked with both approaches to distributing data and AGs are very much easier to manage (SQL2017 and above). The main downside with AGs is they can require greater bandwidth than some replication scenarios.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

cphite SSCarpal Tunnel Points: 4278 More actions · Answer 8

Just in case anyone is curious or has the same problem - the solution provided by bpflanzer works perfectly. I've tested it a couple of times now in our test environment against our huge database, and it's worked reliably now three times in a row.

So, thanks!

One other question... I've noticed that every time I create a new subscription - through the GUI or the script method - the linked server we have on Source that points to Destination gets set to Data Access = False

We use this linked server for various things and so it needs to have data access... is there a way to prevent Replication setup from doing this?