Definitely broken; how do I diagnose and fix?

Question

Post reply

Definitely broken; how do I diagnose and fix?

steve smith-401573

SSChampion

Points: 10889
More actions
March 13, 2009 at 7:01 am

#200960

I have a snapshot replication process that I broke (don't ask why) and for several days it did not run. Now it's running again. But the symptom I'm seeing actually occurred before I broke it, so I'm not sure what is relevant here. I am repeating (with slight edits on server/database names) the diagnostics from the Job Agent history for the snapshot replication job. The job starts 2:30 AM local time, and the diagnostic appears 4 hours later. I'm not sure where to look or how to collect a complete picture of what happened, never mind fixing it:
2009-03-13 06:30:09.988 Connecting to Subscriber ' '
2009-03-13 06:30:10.223 Agent message code 9002. The transaction log for database ' ' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
2009-03-13 06:30:10.223 Category:COMMAND
Source: Failed Command
Number:
Message: if COLUMNPROPERTY( OBJECT_ID('MSreplication_subscriptions'),'distribution_agent','AllowsNull') <> 1 BEGIN alter table MSreplication_subscriptions alter column distribution_agent sysname null end
2009-03-13 06:30:10.223 Category:NULL
Source: Microsoft SQL Native Client
Number: 9002
Message: The transaction log for database ' ' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
There is more content I can share, but I could really use guidance on how to pinpoint whether the replication, which has been running for 4 hours when this happens, is causing the issue. The physical size of the .mdf file is 950 MB!
Incidentally, the log_reuse_wait_desc column has "active transaction" for the distribution db.
TIA!

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 1

is the recovery mode of the subscriber full? make sure it's simple

steve smith-401573 SSChampion Points: 10889 More actions · Answer 2

Good point! When I restored the database over from the production server, restoring from a backup of the production database, I failed to change from full to simple. That allowed the log file to grow. That does not explain how it grew 45 GB in 30 minutes, however. Even allowing a 3-1 ratio for transactions, the database simply isn't that huge. Or would that be a normal ratio for a log file having a snapshot replication being loaded transactionally? .mdf size is approx 860 MB. A 45 GB logfile represents a 52:1 ratio of .ldf to .mdf - is that a 'normal' ratio?

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 3

i've done bulk inserts which is the same way a snapshot replication does it and the log file can grow even if it's simple recovery model. i think it creates the clustered index as it goes and this is the reason since index ops are always fully logged

steve smith-401573 SSChampion Points: 10889 More actions · Answer 4

SQL Noob (3/13/2009)
i've done bulk inserts which is the same way a snapshot replication does it and the log file can grow even if it's simple recovery model. i think it creates the clustered index as it goes and this is the reason since index ops are always fully logged

When you say 'fully logged', assuming that the recovery model is full, not simple, yes? In a simple model, even the index will not be logged?

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 5

pretty sure i read in BOL that index rebuilds, create index, etc are fully logged even if the recovery model is simple

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 6

Log files will grow when you snapshot is being done, But to this extent? I gig DB with a 45 gig Transaction log growth is certainly out of the ordinary.

Check in the Database what is the percentage growth set up fro transaction log.

-Roy

steve smith-401573 SSChampion Points: 10889 More actions · Answer 7

Roy, to answer your question: 10%, with a restriction of 2,097,152 MB. Since the database is not housed on a 2+ TB drive, that limit is meaningless, yes?

For anyone:

I have two publications, both copying their content under a common unc. One is updating, the other is not. I believe that my snapshot replication is copying from the wrong directory under the unc. How can I verify that? If it IS the wrong directory, how can I fix it?

TIA!

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 8

Yea, that is a max someone set up. I am not sure for what. Usually the default is Unrestricted growth if I am not mistaken.

This is not production right? If it is production, please dont do what I am suggesting.. 🙂

Hopefully this is the Test environment. Now what you could do is to put the DB back in FULL mode, take a transaction Log back up. This would clear out the transaction log. Put it back to simple mode and then try the Snapshot again.

-Roy

steve smith-401573 SSChampion Points: 10889 More actions · Answer 9

1. I inherited this beast, so documentation was sparse at best. I fill in as I go, but I still miss items.

2. It is production, so I can't do as you suggest. However, since it's a reports database, not a 'pure' production database, I have a little leeway, especially in an effort to prepare the content for the customer to run their reports.

3. We have a minimal test environment, that does NOT extend to the reports database arena.

noeld SSC Guru Points: 96590 More actions · Answer 10

steve smith (3/13/2009)
Roy, to answer your question: 10%, with a restriction of 2,097,152 MB. Since the database is not housed on a 2+ TB drive, that limit is meaningless, yes?
For anyone:
I have two publications, both copying their content under a common unc. One is updating, the other is not. I believe that my snapshot replication is copying from the wrong directory under the unc. How can I verify that? If it IS the wrong directory, how can I fix it?
TIA!

Right click on the publication and select "Properties" then under the "Snapshot" page verify your destination.

* Noel

steve smith-401573 SSChampion Points: 10889 More actions · Answer 11

Noel -

Thanks. I can verify. The issue is fixing if it's wrong. In general, as a 'newbie' I find it easier to stop existing replication and start over, rather than determining what's broken. Especially once I've created such a convoluted trail of repeated attempts at repair - by creating a brand new publication and then subscribing to the new publication I was able to replace the production process. Now I just have to 'clean-up' all the clutter - broken jobs, broken pubs, etc....

How can I ensure that the distribution database (and other contents) are properly cleaned up? In SS2005 replication is a 'young', relatively fragile process easily broken. Presumably in SS2008 some of the issues have been resolved or improved, so that maintaining replication has become easier. But, in the meantime, for a small DB (.8 GB) replication is easier to redo than repair.

noeld SSC Guru Points: 96590 More actions · Answer 12

steve smith (3/16/2009)
Noel -
Thanks. I can verify. The issue is fixing if it's wrong. In general, as a 'newbie' I find it easier to stop existing replication and start over, rather than determining what's broken. Especially once I've created such a convoluted trail of repeated attempts at repair - by creating a brand new publication and then subscribing to the new publication I was able to replace the production process. Now I just have to 'clean-up' all the clutter - broken jobs, broken pubs, etc....
How can I ensure that the distribution database (and other contents) are properly cleaned up? In SS2005 replication is a 'young', relatively fragile process easily broken. Presumably in SS2008 some of the issues have been resolved or improved, so that maintaining replication has become easier. But, in the meantime, for a small DB (.8 GB) replication is easier to redo than repair.

Steve, the UI lets you fix it. Totally agree with you, for small DBs it is easier to redo than repair.

The Problem that replication really impose is "planning".

If you plan to do a lot of replication related configurations you should:

- Standardize file/Folder locations and CALS

- Save scripts to drop/recreate replication

- Create scripts to manage replication

- Use standard names/schemas etc on your DBs

- Use standard Accounts across servers

* Noel

steve smith-401573 SSChampion Points: 10889 More actions · Answer 13

noeld (3/16/2009)

Steve, the UI lets you fix it. Totally agree with you, for small DBs it is easier to redo than repair.
The Problem that replication really impose is "planning".
If you plan to do a lot of replication related configurations you should:
- Standardize file/Folder locations and CALS
- Save scripts to drop/recreate replication
- Create scripts to manage replication
- Use standard names/schemas etc on your DBs
- Use standard Accounts across servers

Noel, it may be true that the UI lets you fix it. But I'm not certain of which piece is broken and needs to be fixed. When I do what I think is appropriate in the UI to fix the process, it stays broken. Consequently, it seems easier to create anew rather than repair. Funny thing is, whenever I touch (and change thereby) a replication, it no longer seems to behave appropriately - I'll start a new thread with the description of the latest.

thecodeguys Ten Centuries Points: 1076 More actions · Answer 14

The Problem that replication really impose is "planning".
If you plan to do a lot of replication related configurations you should:
- Standardize file/Folder locations and CALS
- Save scripts to drop/recreate replication
- Create scripts to manage replication
- Use standard names/schemas etc on your DBs
- Use standard Accounts across servers

I Can not agree more. Script everything while it is stable. Nothing is better than being able to drop the publication and recreate it quickly with a script.