Mirroring and SQL Agent Jobs

  • Hey everyone, need some help here.

    We have 2 servers. Server A and Server B. These 2 servers are setup in a automatic failover H.A. mirroring configuration with database "FraggleRockKids". Both servers are completely different machines. They are in the same domain. SQL Authentication is used for mirroring and database failovers.

    Two weeks ago, we had an unplanned failover of the database. Everything appeared to have worked correctly and for a week, everyone was happy. As it currently stands the Principal is on Server A nad the Mirrored database is on Server B.

    Now, since SQL Agent jobs are not mirrored, we have the same agent jobs running on each server and the job does a check to see if the FraggleRockKids database is Online or in a Mirrored State to ensure that we don't get all those lovely errors for the database not being accessable. However, last week we started getting calles from customers complaining of issues that were tied to the SQL Agent Jobs and on the Server A, specifically some of the larger daily automatted processes for generating reports and such were deadlocking or reporting bad data. Come to find out that the SQL Agent Jobs on Server B, which is where the Mirrored database currently resides, appeared to be running their jobs on Server A. Now, I had a very difficult time believing this as I didn't think it was possible. So I created some of my own tests.

    I created a table on the FraggleRockKids Principal database that would store a NVCHAR(1000) and a DATETIME. Basically a message and timestamp. I did the same inside the master database on Server A and B. I then created a job on Server A and a Job on Server B. Both jobs have a default database pointing to the mirrored database in question. The job INSERTS into each of the tables above. One inside the Principal database and one inside the master database. I did not setup any check to see if the database was online, so it should throw an error on Server B where the mirror resides, when it tries to insert into the table. My insert contains a message stating which server I am on (principal/mirror) and a concatination of the @@ServerName.

    What I found is that rather than getting an error on Server B as I was expecting, I actually get 2 records in the table I created inside of the principal database showing the messages from each of the servers. Additionally, the table in the master database on Server A has 2 records, 1 from each server. The table in Server B has 0 records.

    Needless to say, I was a bit shocked. To confirm that this was indeed happening, I setup a profile trace where in I only looked for connectsion to Server A from Server B. Guess what, My code is captured as running on Server A.

    So my first fear is that everything on the server is doing this. This isn't the case. When I create a job that points directly to the master database on Server B, I get a record in the test table as expected.

    So it appears that for some reason, any job that is running on Server B that is pointing directly to the FraggleRockKid database that is in a mirrored state is running successfully on Server A.

    Does anyone have any idea on what is going on here. I was under the impression that SQL Agent jobs can't run cross server and can't be mirrored. However, it very much appears that it is happening here.

    Thoughts, suggestions, requests for additional information, anything?

    Thanks,

    Fraggle

  • Erm, guesses, but are you referencing tables by 4-part names? Or the jobs not targeted at the local server?

    Jobs aren't mirrored as they're stored in the msdb database which can't be mirrored.

  • If you created the job on ServerA and then scripted it out so that you could just run the script against Server B to create the job, this would be the most likely cause of the problem you are describing. When you script a job, the last thing the script does is call sp_add_jobserver to set the target servername for the job execution. If this was explicitly defined as ServerA instead of something generic like (local) the job on ServerB would be pointed at ServerA as a target.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Gazareth (2/6/2012)


    Erm, guesses, but are you referencing tables by 4-part names? Or the jobs not targeted at the local server?

    Jobs aren't mirrored as they're stored in the msdb database which can't be mirrored.

    No, my tests did not use 4 part naming conventions. The job is pointing to the local server.

  • Jonathan Kehayias (2/6/2012)


    If you created the job on ServerA and then scripted it out so that you could just run the script against Server B to create the job, this would be the most likely cause of the problem you are describing. When you script a job, the last thing the script does is call sp_add_jobserver to set the target servername for the job execution. If this was explicitly defined as ServerA instead of something generic like (local) the job on ServerB would be pointed at ServerA as a target.

    Johnathan, how can I confirm this? I did script it from Server A to Server B for my test. Not sure on the original job scripting. Recreating the script I don't see the sp_add_jobserver proc being called. In checking the option inside of the job for "Target", both jobs are pointing to the local server. Additionally, checked sys.servers and that shows there isn't a linked server setup between the two servers for them to connect with.

    Thanks,

    Fraggle

  • Fraggle-805517 (2/6/2012)


    Jonathan Kehayias (2/6/2012)


    If you created the job on ServerA and then scripted it out so that you could just run the script against Server B to create the job, this would be the most likely cause of the problem you are describing. When you script a job, the last thing the script does is call sp_add_jobserver to set the target servername for the job execution. If this was explicitly defined as ServerA instead of something generic like (local) the job on ServerB would be pointed at ServerA as a target.

    Johnathan, how can I confirm this? I did script it from Server A to Server B for my test. Not sure on the original job scripting. Recreating the script I don't see the sp_add_jobserver proc being called. In checking the option inside of the job for "Target", both jobs are pointing to the local server. Additionally, checked sys.servers and that shows there isn't a linked server setup between the two servers for them to connect with.

    Thanks,

    Fraggle

    Quick update. I found sp_help_jobserver and ran it against my test job. The server_name column that is being populated shows the local server, or the server that the job is running on. So Server A is showing Server A for the target job server. Server B is targeting Server B.

    Thanks,

    Fraggle

  • Post a script that would generate a repro in your environment and I'll see if I can do the same in mine.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • - are you monitoring dbmirroring events ? If not, you should implement that ! (see attachments )

    Your default trace file(s) may also help you to diagnose this issue !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Jonathan Kehayias (2/6/2012)


    Post a script that would generate a repro in your environment and I'll see if I can do the same in mine.

    Here you go. Strait raw output from Script Drop/Create.

    I do see the sp_add_job now, but it is still pointing to Local

    /****** Object: Job [NathanTest] Script Date: 02/06/2012 13:58:58 ******/

    IF EXISTS (SELECT job_id FROM msdb.dbo.sysjobs_view WHERE name = N'NathanTest')

    EXEC msdb.dbo.sp_delete_job @job_id=N'0044339e-8963-4764-adc1-a68f62b53b33', @delete_unused_schedule=1

    GO

    /****** Object: Job [NathanTest] Script Date: 02/06/2012 13:58:58 ******/

    BEGIN TRANSACTION

    DECLARE @ReturnCode INT

    SELECT @ReturnCode = 0

    /****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 02/06/2012 13:58:59 ******/

    IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

    BEGIN

    EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    DECLARE @jobId BINARY(16)

    select @jobId = job_id from msdb.dbo.sysjobs where (name = N'NathanTest')

    if (@jobId is NULL)

    BEGIN

    EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'NathanTest',

    @enabled=0,

    @notify_level_eventlog=0,

    @notify_level_email=0,

    @notify_level_netsend=0,

    @notify_level_page=0,

    @delete_level=0,

    @description=N'No description available.',

    @category_name=N'[Uncategorized (Local)]',

    @owner_login_name=N'nathan.heaivilin', @job_id = @jobId OUTPUT

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    /****** Object: Step [blah] Script Date: 02/06/2012 13:58:59 ******/

    IF NOT EXISTS (SELECT * FROM msdb.dbo.sysjobsteps WHERE job_id = @jobId and step_id = 1)

    EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'blah',

    @step_id=1,

    @cmdexec_success_code=0,

    @on_success_action=1,

    @on_success_step_id=0,

    @on_fail_action=2,

    @on_fail_step_id=0,

    @retry_attempts=0,

    @retry_interval=0,

    @os_run_priority=0, @subsystem=N'TSQL',

    @command=N'declare @cnt nvarchar(10)

    SELECT @cnt = CONVERT(NVARCHAR(10), count(*))

    FROM sys.databases

    insert into NathanTest(col1, dte)

    SELECT @@serverName + '' - Principal Server. DBCount = '' + @cnt, getdate()

    INSERT INTO master.dbo.NathanTest (msg, dte)

    SELECT @@serverName + '' - Principal Server. DBCount = '' + @cnt, getdate()',

    @database_name=N'FraggleRockKid',

    @flags=0

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'blah',

    @enabled=1,

    @freq_type=4,

    @freq_interval=1,

    @freq_subday_type=4,

    @freq_subday_interval=5,

    @freq_relative_interval=0,

    @freq_recurrence_factor=0,

    @active_start_date=20120203,

    @active_end_date=99991231,

    @active_start_time=0,

    @active_end_time=235959

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    COMMIT TRANSACTION

    GOTO EndSave

    QuitWithRollback:

    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

    EndSave:

    GO

  • ALZDBA (2/6/2012)


    - are you monitoring dbmirroring events ? If not, you should implement that ! (see attachments )

    Your default trace file(s) may also help you to diagnose this issue !

    We have alerts setup that notify us on failovers and do some things inside the system after the failover is completed. In this case, the failover has occured and everything appears to be working as expected with the exception of the jobs.

    Fraggle.

  • This is how we check if the db is available:

    if exists ( select 1

    from sys.database_mirroring

    Where mirroring_role_desc = 'MIRROR'

    and database_id = db_id('FraggleRockKid') )

    begin

    Print 'Mirrored db Status [FraggleRockKid]: ' + convert(varchar(128),databasepropertyex( 'FraggleRockKid','Status'))

    end

    else

    begin

    Start processing here

    end

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Just an update on this. Restart the SQL Agent Service has stopped the issue from occuring. We still don't know why it occured in the first place. When we get some time, we are going to see if we can setup a test and replicate it. If we are able to I will let everyone know....and will probably have some discussions with Microsoft on it.

    Thanks,

    Fraggle

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply