March 13, 2013 at 5:59 am
Hi everyone,
I just have a quick question about finding the cause of a delay I saw with one of my subscriptions today. I noticed that data was not being published to the subscription when I tried accessing a report and began looking for the issue. I did the usual checks with the replication monitor and I checked the connection between both servers, looked at the job history, checked the sync history etc. But I found no errors or warnings.
The only thing I could see was that the sync status said that the subscription was waiting on a reply from the server but thats all the info I could find :ermm:
Any help on where to look to find the cause of this delay would be really helpful, and as a side note should I apply alerts to notify me if there is a delay again?
Thanks in advance for all your help,
Kindest Regards,
Craig
Specs:
Windows Server 2008
SQL Server 2008 R2
March 13, 2013 at 10:07 am
Have you tried inserting a tracer token to verify where the delay is actually occurring?
You can also create custom alerts (in your SQL Server Agent >> Alerts) via the SSMS, and in the Replication Monitor (they kind of go hand in hand).
You can also create your own little "custom-baked" process to monitor the latency between your publisher/distributor/subscribers (I implemented this at our shop at it works quite well). If you're interested I can post those scripts...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
March 14, 2013 at 9:37 am
MyDoggieJessie (3/13/2013)
Have you tried inserting a tracer token to verify where the delay is actually occurring?You can also create custom alerts (in your SQL Server Agent >> Alerts) via the SSMS, and in the Replication Monitor (they kind of go hand in hand).
You can also create your own little "custom-baked" process to monitor the latency between your publisher/distributor/subscribers (I implemented this at our shop at it works quite well). If you're interested I can post those scripts...
I wouldnt mind seeing those actually as it will save me doing something very similar from scratch!
'Only he who wanders finds new paths'
March 16, 2013 at 10:49 am
There are many alerts you can set up via the replication monitor (I encourage you to set those up as well). To create a custom process to track (this of course isn't full proof, but works well enough)
First you should create a table on your publisher:
CREATE TABLE [dbo].[_ReplicationCheck](
[CurrentTime] [datetime] NOT NULL,
CONSTRAINT [PK__ReplicationCheck] PRIMARY KEY CLUSTERED
(
[CurrentTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 92) ON [PRIMARY]
) ON [PRIMARY]
GO
Then add this article to your publication. Verify it has been replicated over to your subscriber.
Then create the SQL Agent job on the Publisher to regularly update the table you created:
USE [msdb]
GO
/****** Object: Job [DAILY - Replication Checker :: Update Current Time] Script Date: 03/16/2013 11:25:58 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [REPL-Checkup] Script Date: 03/16/2013 11:25:58 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'REPL-Checkup' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'REPL-Checkup'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'DAILY - Replication Checker :: Update Current Time',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=2,
@notify_level_netsend=0,
@notify_level_page=2,
@delete_level=0,
@description=N'SJM - 8/23/2012 - Job updates the current time in the _ReplicationCheck tables in all 4 CC databases. A counter-part job them checks the timestamps in the publication tables against the time in the replicated tables. If it''s off by 30 minutes, an alert is sent via text and email to dba_alerts@mycompany.com',
@category_name=N'REPL-Checkup',
@owner_login_name=N'MyUser',
@notify_email_operator_name=N'DBA',
@notify_page_operator_name=N'DBA', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Update CurrentTime in _ReplicationCheck tables] Script Date: 03/16/2013 11:25:59 ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Update CurrentTime in _ReplicationCheck tables',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'UPDATE MyDatabase.dbo.[_ReplicationCheck]
SET CurrentTime = GETDATE()
',
@database_name=N'master',
@flags=4
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 30 minutes',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=24,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20120823,
@active_end_date=99991231,
@active_start_time=0,
@active_end_time=235959,
@schedule_uid=N'88bb96d2-a2a1-4c1c-aa1a-8bee09b9f25f'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
And for the last Agent Job (which does the actual monitoring), also created on the Publisher:
USE [msdb]
GO
/****** Object: Job [DAILY - Replication Checker :: Monitor Status] Script Date: 03/16/2013 11:28:07 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [REPL-Alert Response] Script Date: 03/16/2013 11:28:07 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'REPL-Alert Response' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'REPL-Alert Response'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'DAILY - Replication Checker :: Monitor Status',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=2,
@delete_level=0,
@description=N'SJM - 8/23/2012 - Created to monitor the time it takes to replicate a single table from all the CC databases to their respective subscription databases on the subscriber (currently MyServer) - The job does NOT have a notification set to email the dba_alerts team as this is already happening within the job step.',
@category_name=N'REPL-Alert Response',
@owner_login_name=N'MyUser',
@notify_page_operator_name=N'DBA', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Monitor Replication Threshold] Script Date: 03/16/2013 11:28:07 ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Monitor Replication Threshold',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
/* DECLARE VARIS NEEDED */
DECLARE
@CAProdDate datetime,
@CAReplDate datetime,
@sql varchar(max),
@Subject varchar(175),
@Body varchar(max)
/* Assign defaults */
DECLARE
@CALate bit = 0,
@Threshold int = 7200 --> Time in seconds
/* Fetch current time stamps */
SET @CAProdDate = (SELECT CurrentTime FROM MyDatabase.dbo.[_ReplicationCheck])
SET @CAReplDate = (SELECT CurrentTime FROM [MySubscriberServer].MyDatabase.dbo.[_ReplicationCheck])
/* Determine which publications are late */
IF (ABS(DATEDIFF(second, @CAProdDate, @CAReplDate)) > @Threshold)
BEGIN
SET @CALate = 1
END
IF (@CALate = 1)
BEGIN
SET @sql = ''
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
SELECT
a.publisher_db,
a.article,
ds.article_id,
ds.UndelivCmdsInDistDB,
ds.DelivCmdsInDistDB,
CASE WHEN md.[status] = 1 THEN ''''Started''''
WHEN md.[status] = 2 THEN ''''Succeeded''''
WHEN md.[status] = 3 THEN ''''In Progress''''
WHEN md.[status] = 4 THEN ''''Idle''''
WHEN md.[status] = 5 THEN ''''Retrying''''
WHEN md.[status] = 6 THEN ''''Failed''''
ELSE ''''Other''''
END [Status],
CASE WHEN md.warning = 0 THEN ''''OK''''
WHEN md.warning = 1 THEN ''''Expired''''
WHEN md.warning = 2 THEN ''''Latency''''
ELSE ''''OTHER''''
END [Warnings],
md.cur_latency / 60.0 / 60.0 [Latency (min.)]
FROM
[MyDistributionServer].Distribution.dbo.MSdistribution_status ds
JOIN
[MyDistributionServer].Distribution.dbo.MSarticles a
ON a.article_id = ds.article_id
JOIN
[MyDistributionServer].Distribution.dbo.MSreplication_monitordata md
ON md.agent_id = ds.agent_id
WHERE
UndelivCmdsInDistDB > 0
AND a.publisher_db = ''''MyDatabase''''
ORDER BY
a.publisher_db, UndelivCmdsInDistDB DESC''
SET @Subject = ''REPLICATION IS FALLING BEHIND ON '' + @@SERVERNAME
SET @Body = ''The threshold ('' + CAST(@Threshold AS varchar(4))
+ '' [sec.]) which has been allotted to replicate the timestamp in the _ReplicationCheck tables has been exceeded.''
+ CHAR(13) + ''======================================================================================''
+ CHAR(13) + '' REPLICATION HAS FALLEN BEHIND!!!''
+ CHAR(13) + ''======================================================================================''
+ CHAR(13) + CHAR(13)
+ ''Please check the replication monitor for errors/issues or run the QUERY BELOW ON THE DISTRIBUTOR :: THEN CONTACT THE ON CALL DBA IMMEDIATELY AT 000.000.0000''
+ CHAR(13) + CHAR(13) + @sql
EXEC msdb.dbo.sp_send_dbmail @recipients = ''dba_alerts@mycompany.com'', @copy_recipients=''nagiosdb@mycompany.com'', @subject = @Subject, @body = @Body,
@body_format = ''TEXT'', @importance = ''High''
BEGIN TRY
SELECT
1 / 0 -- Force the job to fail so the page will get sent
END TRY
BEGIN CATCH
RAISERROR (''Replication Threshold EXCEEDED; Notifying DBA team'', 16, 1)
END CATCH
END
',
@database_name=N'master',
@flags=4
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every minute',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=5,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20120823,
@active_end_date=99991231,
@active_start_time=13000,
@active_end_time=75959,
@schedule_uid=N'3c2cd1eb-a616-422f-a5cd-e40b79e57629'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
You can also monitor using tracer tokens. I created a job that automatically inserts them as well (because I want to know if the latency is occurring between the publisher >> distributor or from the distributor to my subscribers)
Script to create a SQL Agent job to insert the tracer tokens (Job also created on the Publisher:
USE [msdb]
GO
/****** Object: Job [DBA - Monitor Replication Latency] Script Date: 03/16/2013 11:43:48 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [REPL-Checkup] Script Date: 03/16/2013 11:43:48 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'REPL-Checkup' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'REPL-Checkup'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'DAILY - Monitor Replication Latency',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=2,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@category_name=N'REPL-Checkup',
@owner_login_name=N'MyUser',
@notify_email_operator_name=N'DBA', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Tracer Token - CA] Script Date: 03/16/2013 11:43:48 ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Tracer Token - CA',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'sp_posttracertoken ''PublicationName''',
@database_name=N'MyDatabase',
@flags=4
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Daily - Tracer Monitoring',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=5,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20130228,
@active_end_date=99991231,
@active_start_time=230000,
@active_end_time=60000,
@schedule_uid=N'e1cc826e-0aa1-4eeb-9ef0-ad1729fc0789'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
You can then monitor the tracer tokens using this TSQL (this code must be run against the Distributor:
USE distribution;
SELECT
c.publication,
publisher_commit,
distributor_commit,
DATEDIFF(ss, publisher_commit, distributor_commit) 'Latency - Pub >> Dis',
subscriber_commit,
DATEDIFF(ss, distributor_commit, subscriber_commit) 'Latency - Dis >> Sub'
FROM
MSTracer_tokens a
INNER JOIN MSTracer_history b
ON tracer_id = parent_tracer_id
INNER JOIN MSPublications c WITH (READUNCOMMITTED)
ON a.publication_id = c.publication_id
WHERE DATEDIFF(ss, publisher_commit, distributor_commit) > 5
OR DATEDIFF(ss, distributor_commit, subscriber_commit) > 5
ORDER BY
a.publisher_commit DESC,
c.publication
You should be able to run these scripts as-is, just change the "MyServer", MyDatabase", and things like that to what works for you - Hope it helps!
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply