January 17, 2005 at 2:26 am
Hi,
We have a basic replication topology. 2 servers with one replicating everything to the other using continuous transactional replication and SQL Server 2000 running under Windows Server 2000. Everything works fine for weeks then replication, on the Publisher, just stops for no good reason. The Publisher is stopped along with the agents - there status is: "closed down successfully". If the servers are automatically closed down and restarted due to a power fail there is a 70% chance that replication will not restart.
This is at a client site running a turnkey app and I wouldn't trust anyone with Enterprise Manager. The database records about 15,000 transactions in a working day so its reasonably busy. Apart from the user not having an up-to-date backup copy of their production database replication stopping has no other effects.
We have tried, without success, to identify the cause of this problem and I'm not really looking for help in diagnosing the issue (although any comments would be greatly appreciated). Where I would like some help is how can I automatically monitor the status and restart replication if it has stopped?
TIA - Peter
January 17, 2005 at 6:28 am
Replication is started with jobs.
If you have pull subscriptions, then you should start the jobs at the subscribers, if it is push, then the jobs are at the publisher (or distributor, and don't remember).
You should check the status of the jobs.
January 17, 2005 at 1:48 pm
Hi,
Sorry, forgot to mention - it is Push replication.
The status of the jobs is: "closed down successfully" which seems to indicate its not a failure as such - the shutdown was initiated by 'something'.
What I want to do is recognise the situation and restart.
Cheers, Peter
January 18, 2005 at 5:16 am
I'm also trying to accomplish the same thing. Due to network interruptions, our replication will stop after n number of retries. While I'm looking to the cause of the problem and a better solution, I've created this bandaid.
I created a job to start the replication job every 4 hours. This isn't a good solution but it works. 2 steps.
Step 1.
USE msdb
declare
@ERR integer
EXEC @ERR = sp_start_job @job_name = 'Merge-33'
if @ERR <> 0
begin
print 'JOB ALREADY RUNNING'
END
-- END STEP ONE
--This is what will happens next. If this step fails because the JOB is already running, I proceed to STEP 2. On success, I quit reporting success.
STEP 2
-- JOB ALREADY RUNNING
RAISERROR ('JOB ALREADY RUNNING',0, 1 )
print 'JOB ALREADY RUNNING'
-- On success, quit reporting success
-- On failure, quit reporting failure
-- END JOB
Anyhow this job has actually saved us a couple of times already, but I really want to use a different approach. There's "sp_help_job" which works if you run in transactsql, but so far I can't script this and get the results of @execution_status. If I could then I wouldn't start the job if it was already executing.
Hope this helps..Steve
January 18, 2005 at 10:29 am
I have had replication stop on me as well. To ensure that the distributor is running, I created another schedule to run every 5 minutes and enabled it. It does not throw any errors if the job is running, but will start the job if it fails.
SRB- The code below may be the answer to your issue. However, I prefer the add'l schedule. The proc below has proven to be useful in other areas as well. We use it to ensure our reindexing job has stopped within our specified maintenance window.
-- CHECK TO SEE IF A JOB IS RUNNING
/** sample job check code so you can see how the below proc would work **/
SET NOCOUNT ON
DECLARE @ISRUNNING CHAR (3)
, @JOB_NAME VARCHAR (100)
, @ERROR INT
, @SUBJECT2 VARCHAR (800)
SET @JOB_NAME = ''
SET @SUBJECT2 = @JOB_NAME + ' IS STILL RUNNING.' -- IF YOU WANT TO USE XP_SENDMAIL FOR AN EMAIL NOTIFICATION ABOUT YOUR JOB.
-- CHECK TO SEE IF THE JOB IS RUNNING
EXEC dbo.SP_ISJOBRUNNING
@JOB_NAME
, @ISRUNNING OUTPUT
IF ( RTRIM(LTRIM(@ISRUNNING)) ) = 'N'
BEGIN
-- START THE JOB BECAUSE IT HAS STOPPED
EXEC msdb.dbo.SP_START_JOB @JOB_NAME
-----------------------------------------------------------
-- THIS IS THE PROC A CO-WORKER OF MINE CREATED TO DETERMINE IF A JOB IS RUNNING. YOU PROVIDE THE JOB NAME AND YOUR EXPECTED PARAMETER.
-- 1 = RUNNING, 0 = NOT RUNNING
CREATE PROC sp_IsJobRunning
(
@JobName varchar(64),
@IsRunning char(1) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON
-- declare variables
DECLARE
@job_id uniqueidentifier,
@is_sysadmin int,
@job_owner sysname,
@execution_status int
-- initialize variables
SET @IsRunning = 'N'
SET @job_id = NULL
SET @is_sysadmin = 0
SET @job_owner = NULL
SET @execution_status = 0
-- set job id
SET @job_id = (SELECT job_id FROM msdb..sysjobs WHERE name = @JobName)
IF (@job_id IS NULL)
BEGIN
RAISERROR('Unknown job name', 16, 1)
RETURN 1
END
-- create temp table
CREATE TABLE #xp_results
(
job_id UNIQUEIDENTIFIER NOT NULL,
last_run_date INT NOT NULL,
last_run_time INT NOT NULL,
next_run_date INT NOT NULL,
next_run_time INT NOT NULL,
next_run_schedule_id INT NOT NULL,
requested_to_run INT NOT NULL,
request_source INT NOT NULL,
request_source_id sysname NULL,
running INT NOT NULL,
current_step INT NOT NULL,
current_retry_attempt INT NOT NULL,
job_state INT NOT NULL
)
-- set sysadmin flag
SELECT @is_sysadmin = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)
-- set job owner
SELECT @job_owner = SUSER_SNAME()
-- populate #xp_results
INSERT INTO #xp_results EXECUTE master.dbo.xp_sqlagent_enum_jobs @is_sysadmin, @job_owner
-- set execution status
SET @execution_status = (SELECT job_state FROM #xp_results WHERE job_id = @job_id)
-- set is running flag
IF (@execution_status = 1)
BEGIN
SET @IsRunning = 'Y'
END
-- drop #xp_results
IF EXISTS (SELECT 1 WHERE (OBJECT_ID('tempdb..#xp_results') IS NOT NULL) )
BEGIN
DROP TABLE #xp_results
END
SET NOCOUNT OFF
RETURN 0
END
January 18, 2005 at 2:48 pm
Hi Guys,
Thanks - these are great generic tools that look like they will fix the problem and I can already see other uses for them.
Cheers, Peter
January 19, 2005 at 7:19 am
sstankus1,
Your technique is exactly what I was looking for, thanks.
Steve
December 20, 2005 at 5:15 pm
This may be a bit late, but you can check your settings in the Windows Services that have options for restarting after failure. The Start Up Type : Automatic doesn't allways seem to restart the SQLSERVERAGENT. To make it a little more fail safe go to the properties and go to the Recovery tab and get the service to restart on failure. This solved our problems when replication died quietly. I don't know if this interferes with other proccesses that start the replication.
Cheers
Fred
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply