Automatically restarting replication - how?

  • Hi,

    We have a basic replication topology. 2 servers with one replicating everything to the other using continuous transactional replication and SQL Server 2000 running under Windows Server 2000. Everything works fine for weeks then replication, on the Publisher, just stops for no good reason. The Publisher is stopped along with the agents - there status is: "closed down successfully". If the servers are automatically closed down and restarted due to a power fail there is a 70% chance that replication will not restart.

    This is at a client site running a turnkey app and I wouldn't trust anyone with Enterprise Manager. The database records about 15,000 transactions in a working day so its reasonably busy. Apart from the user not having an up-to-date backup copy of their production database replication stopping has no other effects.

    We have tried, without success, to identify the cause of this problem and I'm not really looking for help in diagnosing the issue (although any comments would be greatly appreciated). Where I would like some help is how can I automatically monitor the status and restart replication if it has stopped?

    TIA - Peter

  • Replication is started with jobs.

    If you have pull subscriptions, then you should start the jobs at the subscribers, if it is push, then the jobs are at the publisher (or distributor, and don't remember).

    You should check the status of the jobs.

  • Hi,

    Sorry, forgot to mention - it is Push replication.

    The status of the jobs is: "closed down successfully" which seems to indicate its not a failure as such - the shutdown was initiated by 'something'.

    What I want to do is recognise the situation and restart.

    Cheers, Peter

  • I'm also trying to accomplish the same thing.  Due to network interruptions, our replication will stop after n number of retries.  While I'm looking to the cause of the problem and a better solution, I've created this bandaid.

    I created a job to start the replication job every 4 hours.  This isn't a good solution but it works. 2 steps.

    Step 1.

    USE msdb

    declare

    @ERR integer

    EXEC @ERR = sp_start_job @job_name = 'Merge-33'

    if @ERR <> 0

     begin

     print 'JOB ALREADY RUNNING'

     END

    -- END STEP ONE

    --This is what will happens next.  If this step fails because the JOB is already running, I proceed to STEP 2.  On success, I quit reporting success.

    STEP 2

    -- JOB ALREADY RUNNING

    RAISERROR ('JOB ALREADY RUNNING',0, 1 )

    print 'JOB ALREADY RUNNING'

    -- On success, quit reporting success

    -- On failure, quit reporting failure

    -- END JOB

    Anyhow this job has actually saved us a couple of times already, but I really want to use a different approach.  There's "sp_help_job" which works if you run in transactsql, but so far I can't script this and get the results of @execution_status.  If I could then I wouldn't start the job if it was already executing.

    Hope this helps..Steve

  • I have had replication stop on me as well. To ensure that the distributor is running, I created another schedule to run every 5 minutes and enabled it. It does not throw any errors if the job is running, but will start the job if it fails.

    SRB- The code below may be the answer to your issue. However, I prefer the add'l schedule. The proc below has proven to be useful in other areas as well. We use it to ensure our reindexing job has stopped within our specified maintenance window.

    -- CHECK TO SEE IF A JOB IS RUNNING

    /** sample job check code so you can see how the below proc would work **/

    SET NOCOUNT ON

    DECLARE @ISRUNNING CHAR (3)

    , @JOB_NAME VARCHAR (100)

    , @ERROR INT

    , @SUBJECT2 VARCHAR (800)

    SET @JOB_NAME = ''

    SET @SUBJECT2 = @JOB_NAME + ' IS STILL RUNNING.' -- IF YOU WANT TO USE XP_SENDMAIL FOR AN EMAIL NOTIFICATION ABOUT YOUR JOB.

    -- CHECK TO SEE IF THE JOB IS RUNNING

    EXEC dbo.SP_ISJOBRUNNING

    @JOB_NAME

    , @ISRUNNING OUTPUT

    IF ( RTRIM(LTRIM(@ISRUNNING)) ) = 'N'

    BEGIN

    -- START THE JOB BECAUSE IT HAS STOPPED

    EXEC msdb.dbo.SP_START_JOB @JOB_NAME

    -----------------------------------------------------------

    -- THIS IS THE PROC A CO-WORKER OF MINE CREATED TO DETERMINE IF A JOB IS RUNNING. YOU PROVIDE THE JOB NAME AND YOUR EXPECTED PARAMETER.

    -- 1 = RUNNING, 0 = NOT RUNNING

    CREATE PROC sp_IsJobRunning

    (

    @JobName varchar(64),

    @IsRunning char(1) OUTPUT

    )

    AS

    BEGIN

    SET NOCOUNT ON

    -- declare variables

    DECLARE

    @job_id uniqueidentifier,

    @is_sysadmin int,

    @job_owner sysname,

    @execution_status int

    -- initialize variables

    SET @IsRunning = 'N'

    SET @job_id = NULL

    SET @is_sysadmin = 0

    SET @job_owner = NULL

    SET @execution_status = 0

    -- set job id

    SET @job_id = (SELECT job_id FROM msdb..sysjobs WHERE name = @JobName)

    IF (@job_id IS NULL)

    BEGIN

    RAISERROR('Unknown job name', 16, 1)

    RETURN 1

    END

    -- create temp table

    CREATE TABLE #xp_results

    (

    job_id UNIQUEIDENTIFIER NOT NULL,

    last_run_date INT NOT NULL,

    last_run_time INT NOT NULL,

    next_run_date INT NOT NULL,

    next_run_time INT NOT NULL,

    next_run_schedule_id INT NOT NULL,

    requested_to_run INT NOT NULL,

    request_source INT NOT NULL,

    request_source_id sysname NULL,

    running INT NOT NULL,

    current_step INT NOT NULL,

    current_retry_attempt INT NOT NULL,

    job_state INT NOT NULL

    )

    -- set sysadmin flag

    SELECT @is_sysadmin = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)

    -- set job owner

    SELECT @job_owner = SUSER_SNAME()

    -- populate #xp_results

    INSERT INTO #xp_results EXECUTE master.dbo.xp_sqlagent_enum_jobs @is_sysadmin, @job_owner

    -- set execution status

    SET @execution_status = (SELECT job_state FROM #xp_results WHERE job_id = @job_id)

    -- set is running flag

    IF (@execution_status = 1)

    BEGIN

    SET @IsRunning = 'Y'

    END

    -- drop #xp_results

    IF EXISTS (SELECT 1 WHERE (OBJECT_ID('tempdb..#xp_results') IS NOT NULL) )

    BEGIN

    DROP TABLE #xp_results

    END

    SET NOCOUNT OFF

    RETURN 0

    END

  • Hi Guys,

    Thanks - these are great generic tools that look like they will fix the problem and I can already see other uses for them.

    Cheers, Peter

  • sstankus1,

    Your technique is exactly what I was looking for, thanks.

    Steve

  • This may be a bit late, but you can check your settings in the Windows Services that have options for restarting after failure. The Start Up Type : Automatic doesn't allways seem to restart the SQLSERVERAGENT. To make it a little more fail safe go to the properties and go to the Recovery tab and get the service to restart on failure. This solved our problems when replication died quietly. I don't know if this interferes with other proccesses that start the replication.

    Cheers

    Fred

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply