Definitely broken; how do I diagnose and fix?

  • I have a snapshot replication process that I broke (don't ask why) and for several days it did not run. Now it's running again. But the symptom I'm seeing actually occurred before I broke it, so I'm not sure what is relevant here. I am repeating (with slight edits on server/database names) the diagnostics from the Job Agent history for the snapshot replication job. The job starts 2:30 AM local time, and the diagnostic appears 4 hours later. I'm not sure where to look or how to collect a complete picture of what happened, never mind fixing it:

    2009-03-13 06:30:09.988 Connecting to Subscriber ' '

    2009-03-13 06:30:10.223 Agent message code 9002. The transaction log for database ' ' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

    2009-03-13 06:30:10.223 Category:COMMAND

    Source: Failed Command

    Number:

    Message: if COLUMNPROPERTY( OBJECT_ID('MSreplication_subscriptions'),'distribution_agent','AllowsNull') <> 1 BEGIN alter table MSreplication_subscriptions alter column distribution_agent sysname null end

    2009-03-13 06:30:10.223 Category:NULL

    Source: Microsoft SQL Native Client

    Number: 9002

    Message: The transaction log for database ' ' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

    There is more content I can share, but I could really use guidance on how to pinpoint whether the replication, which has been running for 4 hours when this happens, is causing the issue. The physical size of the .mdf file is 950 MB!

    Incidentally, the log_reuse_wait_desc column has "active transaction" for the distribution db.

    TIA!

  • is the recovery mode of the subscriber full? make sure it's simple

  • Good point! When I restored the database over from the production server, restoring from a backup of the production database, I failed to change from full to simple. That allowed the log file to grow. That does not explain how it grew 45 GB in 30 minutes, however. Even allowing a 3-1 ratio for transactions, the database simply isn't that huge. Or would that be a normal ratio for a log file having a snapshot replication being loaded transactionally? .mdf size is approx 860 MB. A 45 GB logfile represents a 52:1 ratio of .ldf to .mdf - is that a 'normal' ratio?

  • i've done bulk inserts which is the same way a snapshot replication does it and the log file can grow even if it's simple recovery model. i think it creates the clustered index as it goes and this is the reason since index ops are always fully logged

  • SQL Noob (3/13/2009)


    i've done bulk inserts which is the same way a snapshot replication does it and the log file can grow even if it's simple recovery model. i think it creates the clustered index as it goes and this is the reason since index ops are always fully logged

    When you say 'fully logged', assuming that the recovery model is full, not simple, yes? In a simple model, even the index will not be logged?

  • pretty sure i read in BOL that index rebuilds, create index, etc are fully logged even if the recovery model is simple

  • Log files will grow when you snapshot is being done, But to this extent? I gig DB with a 45 gig Transaction log growth is certainly out of the ordinary.

    Check in the Database what is the percentage growth set up fro transaction log.

    -Roy

  • Roy, to answer your question: 10%, with a restriction of 2,097,152 MB. Since the database is not housed on a 2+ TB drive, that limit is meaningless, yes?

    For anyone:

    I have two publications, both copying their content under a common unc. One is updating, the other is not. I believe that my snapshot replication is copying from the wrong directory under the unc. How can I verify that? If it IS the wrong directory, how can I fix it?

    TIA!

  • Yea, that is a max someone set up. I am not sure for what. Usually the default is Unrestricted growth if I am not mistaken.

    This is not production right? If it is production, please dont do what I am suggesting.. 🙂

    Hopefully this is the Test environment. Now what you could do is to put the DB back in FULL mode, take a transaction Log back up. This would clear out the transaction log. Put it back to simple mode and then try the Snapshot again.

    -Roy

  • 1. I inherited this beast, so documentation was sparse at best. I fill in as I go, but I still miss items.

    2. It is production, so I can't do as you suggest. However, since it's a reports database, not a 'pure' production database, I have a little leeway, especially in an effort to prepare the content for the customer to run their reports.

    3. We have a minimal test environment, that does NOT extend to the reports database arena.

  • steve smith (3/13/2009)


    Roy, to answer your question: 10%, with a restriction of 2,097,152 MB. Since the database is not housed on a 2+ TB drive, that limit is meaningless, yes?

    For anyone:

    I have two publications, both copying their content under a common unc. One is updating, the other is not. I believe that my snapshot replication is copying from the wrong directory under the unc. How can I verify that? If it IS the wrong directory, how can I fix it?

    TIA!

    Right click on the publication and select "Properties" then under the "Snapshot" page verify your destination.


    * Noel

  • Noel -

    Thanks. I can verify. The issue is fixing if it's wrong. In general, as a 'newbie' I find it easier to stop existing replication and start over, rather than determining what's broken. Especially once I've created such a convoluted trail of repeated attempts at repair - by creating a brand new publication and then subscribing to the new publication I was able to replace the production process. Now I just have to 'clean-up' all the clutter - broken jobs, broken pubs, etc....

    How can I ensure that the distribution database (and other contents) are properly cleaned up? In SS2005 replication is a 'young', relatively fragile process easily broken. Presumably in SS2008 some of the issues have been resolved or improved, so that maintaining replication has become easier. But, in the meantime, for a small DB (.8 GB) replication is easier to redo than repair.

  • steve smith (3/16/2009)


    Noel -

    Thanks. I can verify. The issue is fixing if it's wrong. In general, as a 'newbie' I find it easier to stop existing replication and start over, rather than determining what's broken. Especially once I've created such a convoluted trail of repeated attempts at repair - by creating a brand new publication and then subscribing to the new publication I was able to replace the production process. Now I just have to 'clean-up' all the clutter - broken jobs, broken pubs, etc....

    How can I ensure that the distribution database (and other contents) are properly cleaned up? In SS2005 replication is a 'young', relatively fragile process easily broken. Presumably in SS2008 some of the issues have been resolved or improved, so that maintaining replication has become easier. But, in the meantime, for a small DB (.8 GB) replication is easier to redo than repair.

    Steve, the UI lets you fix it. Totally agree with you, for small DBs it is easier to redo than repair.

    The Problem that replication really impose is "planning".

    If you plan to do a lot of replication related configurations you should:

    - Standardize file/Folder locations and CALS

    - Save scripts to drop/recreate replication

    - Create scripts to manage replication

    - Use standard names/schemas etc on your DBs

    - Use standard Accounts across servers


    * Noel

  • noeld (3/16/2009)


    Steve, the UI lets you fix it. Totally agree with you, for small DBs it is easier to redo than repair.

    The Problem that replication really impose is "planning".

    If you plan to do a lot of replication related configurations you should:

    - Standardize file/Folder locations and CALS

    - Save scripts to drop/recreate replication

    - Create scripts to manage replication

    - Use standard names/schemas etc on your DBs

    - Use standard Accounts across servers

    Noel, it may be true that the UI lets you fix it. But I'm not certain of which piece is broken and needs to be fixed. When I do what I think is appropriate in the UI to fix the process, it stays broken. Consequently, it seems easier to create anew rather than repair. Funny thing is, whenever I touch (and change thereby) a replication, it no longer seems to behave appropriately - I'll start a new thread with the description of the latest.

  • The Problem that replication really impose is "planning".

    If you plan to do a lot of replication related configurations you should:

    - Standardize file/Folder locations and CALS

    - Save scripts to drop/recreate replication

    - Create scripts to manage replication

    - Use standard names/schemas etc on your DBs

    - Use standard Accounts across servers

    I Can not agree more. Script everything while it is stable. Nothing is better than being able to drop the publication and recreate it quickly with a script.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply