How to stop Multi-row I/U/D from splitting into many single I/U/D during replication?

Question

Post reply

How to stop Multi-row I/U/D from splitting into many single I/U/D during replication?

dbirchok

Valued Member

Points: 54
More actions
May 19, 2008 at 12:40 pm

#187770

Sorry I am not a DBA but I am a fairly knowledgeable developer, so I might not get the exact terms correct.
I am working at a small company that is using transaction replication to make a copy of a transactional database for reporting purposes. Once a day, data from a Planning DB is loaded into the Transaction DB. This is done by calling one stored procedure in the Planning DB. Both DBs are on the same server. The stored procedure figures out what data from several tables it needs to copy into the Transaction DB. The table (let's call it table A) I am concerned with can have 50,000 to 100,000 rows inserted during this process. The new rows replicate to the Report DB in an acceptable amount of time and everyone is currently happy.
Now I was asked to put a trigger on Table A in the Report DB (they do not want triggers slowing down the Transaction DB) that will be used to update a new table (let's call it Table B) in a ReportSupport DB. Table B is sort of like a summary table in that changes in thousands of rows in Table A will cause me to change an amount in one row of Table B. I wrote the trigger to sum up data from multiple rows in the trigger’s insert and delete tables before update Table B. After we put our staging environment we found a problem. I was wrongly under the impression that Multi-row insert in Table A in the Transaction DB would be replication as a Multi-row insert in Table A in the Report DB. While researching this, I found out that the default replication process is to split it up into multiple single-row inserts. This behavior is a problem for us. Instead of summing up let’s say 2,000 rows that were inserted into Table A and inserting one row into Table B with the amount 2,000, the trigger inserts one row into table B with the amount 1 and then makes 1,999 updates to this row until the amount is 2,000. This causes the transaction log to fill up rather quickly. Our 10G Database had its transaction log grow to 30G before we stopped the replication.
I confirmed that our multi-row insert was be split up by placing a simple trigger on table A that just counted the rows in both the insert and delete tables and wrote the information into Count Table I created for debugging. While this debug trigger was in place, I found out that our replication process was replicating Multi-row updates in Table A in the Transaction DB as Multi-row updates in Table A in the Report DB. Someone (nobody here seams to know who) stopped replication from splitting up Multi-row updates.
Now for the last couple of days I have been trying to find out how this was done. The information available online about replication is very simplistic. I have found hints that are many ways to stop the Multi-row I/U/D statements from getting split up, but none of the authors of the basic replication articles take it the next step. The only method I found that could account for what was happening in our databases, was to make a Stored Procedure execution replication. I don’t believe that this is what is happening for my Multi-row updates. The only “Articles” that exist for our “Publication” are table based. Also the “article” properties for our Table A look the same as the “article” properties for a debug table I setup in replication using the replication wizard’s defaults.
Can anyone tell me what to look for in our replication's “Publication” for why the updates are working different than the inserts for our Table A? Does anyone know a good book or web link with the information I need to do this?
Thanks in advance,
Dave

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 1

Yes... your code for the trigger... let's see it...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dbirchok Valued Member Points: 54 More actions · Answer 2

What do you mean by "Yes", and why do you want to see the trigger. I am on the replication forum asking for information about how to replicate multi-row inserts/updates/deletes without splitting them into single-row inserts/updates/deletes. It is being done for updates on one table setup by people no longer here. So I believe it can be done but we don't know how. A new table I setup in replication myself splits up multi-row updates but I can't see any difference in the replication setup of either table.

Here is the trigger I have put on the replicated table that I am using to test if replication has split up multi-row Insert/Update/Delete statements:

create Trigger [dbo].[trgTable_A]

ON [dbo].[Table_A]

AFTER DELETE, INSERT, UPDATE

AS

begin --trg

declare @When datetime

, @insertedCount bigint

, @deletedCount bigint

, @min-2 int

, @max-2 int

, @Min_Date Datetime

, @Max_Date Datetime

set @When = getDate()

select @insertedCount = count(*)

, @min-2 = Min(Product_Id)

, @max-2 = Max(Product_Id)

, @Min_Date = Min(Last_Mod_Date)

, @Max_Date = Max(Last_Mod_Date)

from inserted

select @deletedCount = count(*)

, @min-2 = isnull(@Min,Min(Product_Id))

, @max-2 = isnull(@Max,Max(Product_Id))

, @Min_Date = isnull(@Min_Date,Min(Last_Mod_Date))

, @Max_Date = isnull(@Max_Date,Max(Last_Mod_Date))

from deleted

if ((@insertedCount = @deletedCount) and (@insertedCount < 100)) return

insert

into Table_A_counts

( [When]

, InsertCount

, DeleteCount

, Min_Product_Id

, Max_Product_Id

, Min_Date

, Max_Date)

values( @When

, @insertedCount

, @deletedCount

, @min-2

, @max-2

, @Min_Date

, @Max_Date)

end --trg

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 3

By "Yes", I meant yes, I know a good reference... it starts with your code... I wanted to see your trigger to see if I could help.

The trigger you posted is only doing checks (or so it seems). Are they using triggers to do the replication? If so, is it possible to post that? If they're using RBAR, it should be easy to fix to keep the "splits" you're talking about from happening...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dbirchok Valued Member Points: 54 More actions · Answer 4

We are not using triggers for replication. Forget about the triggers. We are using replication functionality provided by SQL Server 2005. Creating a "Publication" with "Articles" on a "Publisher" which feeds data to a "Subscriber" through a "Distributer".

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 5

While researching this, I found out that the default replication process is to split it up into multiple single-row inserts. This behavior is a problem for us. Instead of summing up let’s say 2,000 rows that were inserted into Table A and inserting one row into Table B with the amount 2,000, the trigger inserts one row into table B with the amount 1 and then makes 1,999 updates to this row until the amount is 2,000. This causes the transaction log to fill up rather quickly.

Ok... but I didn't bring up the triggers first... and I didn't know that "replication" actually made triggers (sorry). If it does and they are RBAR like you say, I don't know how to change that behavior other than rewritting the triggers.

When you talk about the "replication process", you are talking about SQL Server Replication, aren't you?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dbirchok Valued Member Points: 54 More actions · Answer 6

The triggers I talk about are not part of replication. These triggers are put on a replicated table in the subscriber database to do some business logic after replication is done.

Again, multi-row updates are being replicated into my subscriber table as multi-row updates. Multi-row Inserts are being replicated into my subscriber table as multiple single-row inserts. I can't find out why. Does anyone know where I should look?

p-40 Grasshopper Points: 15 More actions · Answer 7

Hi there

Currently I'm also exploring the Replication options within SQL-Server 2005.

Although I don't have the solution to your question, perhaps what I know about the replication process can help you figure it out.

The replication on the Subscriber side actually uses a Stored Procedure that is executed on every record that comes across from the Distributer (in a Push configuration). There are basically 3 stored procedures one for Insert, Update and Delete. (I found out because I actually changed the SP to add additional functionalities.) Unfortunately it doesn't solve my requirement.

My guess it's not possible what you are trying to achieve, because the replication SP's work on single records and not multi recordset.

Anyway hope it helps

Cheers

Dino

Kendal Van Dyke SSCertifiable Points: 5972 More actions · Answer 8

Have you considered creating an indexed view that does the summarization for you? You can replicate the indexed view to your Reporting Server as a table. If a thousand rows of I/U/D activity results in only a single row changing in your indexed view then subsequently only one transaction gets replicated.

Kendal Van Dyke

http://kendalvandyke.blogspot.com/

Kendal Van Dyke
http://kendalvandyke.blogspot.com/[/url]

dbirchok Valued Member Points: 54 More actions · Answer 9

No a view won't help. I gave a simplified version of what the trigger is doing as background. Please concentrate on the replication question.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

I believe that this is an option of the published Articles. You will have to change it in each Article's definition, or change the Article defaults and then drop and re-add all of the Articles.

The relevant section in Book OnLine is "Specifying How Changes Are Propagated for Transactional Articles". It is in the SQL Server Replication book, in the following section chain: Implementing Replication > Publishing Data and Database Objects > Article Options for Transactional Replication >

Here is a copy at Technet:http://technet.microsoft.com/en-us/library/ms152489.aspx

and another at MSDN:http://msdn.microsoft.com/en-us/library/ms152489.aspx

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

gabriel-rosario Newbie Points: 8 More actions · Answer 11

the only you have to do is

put in your trigger NOT FOR REPLICATION what means that what trigger do. not going to replicate.

create Trigger [dbo].[trgTable_A]

ON [dbo].[Table_A] NOT FOR REPLICATION

AFTER DELETE, INSERT, UPDATE

AS

begin --trg

declare @When datetime

, @insertedCount bigint

, @deletedCount bigint

, @min-2 int

, @max-2 int

, @Min_Date Datetime

, @Max_Date Datetime

set @When = getDate()

select @insertedCount = count(*)

, @min-2 = Min(Product_Id)

, @max-2 = Max(Product_Id)

, @Min_Date = Min(Last_Mod_Date)

, @Max_Date = Max(Last_Mod_Date)

from inserted

select @deletedCount = count(*)

, @min-2 = isnull(@Min,Min(Product_Id))

, @max-2 = isnull(@Max,Max(Product_Id))

, @Min_Date = isnull(@Min_Date,Min(Last_Mod_Date))

, @Max_Date = isnull(@Max_Date,Max(Last_Mod_Date))

from deleted

if ((@insertedCount = @deletedCount) and (@insertedCount < 100)) return

insert

into Table_A_counts

( [When]

, InsertCount

, DeleteCount

, Min_Product_Id

, Max_Product_Id

, Min_Date

, Max_Date)

values( @When

, @insertedCount

, @deletedCount

, @min-2

, @max-2

, @Min_Date

, @Max_Date)

end --trg

dbirchok Valued Member Points: 54 More actions · Answer 12

In case anyone is trying to do the same thing I have asked about, apparently it can’t be done in SQL server. I haven’t looked at this problem for over two months. From what I can remember about some article I read back then, SQL server will split up multi-row updates when there is a change in primary key. There is some internal technical reason for this. However, I believe that inserts, by definition, have their primary key changed. Therefore, Multi-row inserts are always split up into multiple Single-row inserts.

The last entry posted 6/23/08 is less than useless. Did you even read my post? I WANT the trigger to fire during replication. Just not one row at a time. The entry on 5/27/08 is just pointing to a Microsoft resource I already read. This article came up many times during my internet searches. It is information on how to get replication to use I/U/D sql, auto generated stored procedures, or custom stored procedures to replicate the data when a row is inserted, updated, or deleted in the published database. It says you can even skip replicating the data when a row is inserted, updated, or deleted is the published database. I saw nothing about keeping Multi-row inserts/updates/deletes together during replication.

noeld SSC Guru Points: 96590 More actions · Answer 13

dbirchok (5/21/2008)
The triggers I talk about are not part of replication. These triggers are put on a replicated table in the subscriber database to do some business logic after replication is done.
Again, multi-row updates are being replicated into my subscriber table as multi-row updates. Multi-row Inserts are being replicated into my subscriber table as multiple single-row inserts. I can't find out why. Does anyone know where I should look?

Let's go step-by-step:

Multi-row updates/inserts/deletes are replicated as "MULTIPLE" single-row U/I/D respectively, by DESIGN.

-Why? --> Because it checks each row on the replica!! If it sends an update it verifies that the row exists, if it deletes it also check that the row existed if it inserts it will verify that the row is not there.

At the point it detects the "data consistency" failure it Stops and you could check exactly what row is not valid.

You can bypass this behaviour replicating stored procedure execution! There are some trade-off with this option but It will do what you want.

* Noel

dbirchok Valued Member Points: 54 More actions · Answer 14

I did a lot of research about this and your answers make no sense. Let me go “step-by-step” for you.

1.Your reason for doing one row at a time would be equally valid when updating a table in the original publishing database. We all know you can write multi-row I/U/D statements against SQL server tables.

2.When any one of the many single-row inserts fails, the entire multi-row insert fails replication. Any successful single-row inserts before the failure are rolled back. You can’t “verify that the row is not there” because none of the rows are there.

3.And let’s just say that we could keep the successful rows, you still would have trouble identifying the problem row. For example, 5 rows where updated, 2 made it through replication. You don’t know the order the updates happened so you don’t know which of the last 3 caused the problem. Maybe, in this case, you could look at each of three rows manually. However, multiply these counts by 1000, and that option flies out the window.

4.Multi-row Updates can be replicated as Multi-row updates when the primary key isn’t changed. It is currently happening in our replication process (As I stated in my earlier posts).

5.And before you say it, a primary key change does not guarantee that a row you want to update is still in the replicated table.

6.When using stored procedures (auto generated or custom) to do replication, they have a few fixed set of input parameters to choose from. All are based on processing one row at a time. It seams to me, the only chance to get multi-row replication behavior is to use I/U/D statements for replication. (I could be wrong on this one)

Saying "By DESIGN" is a cop-out. I am sure replication is working as Microsoft has it currently designed. And its design is good enough for most replication. I am just pushing the edge of the envelope. Microsoft has changed designs when enough of a need is there.

I sorry that my posts may be a bit terse. I apologize to the many experts on this site that have helped many people solve their problems. I posted this to see if any of you have the in depth knowledge about my topic. Not to get people who are trying to inflate there stats by regurgitating pat answers from Microsoft’s anemic documentation.