MSDB Deadlock = Server crawl?

  • Hey everyone,

    Last night about 7 p.m. EST our wonderful server decided to started doing the Watermellon Crawl (Country song if you don't know it).

    The only thing that we could find was that the in the MSDB, we were getting a ton of deadlocks that started about an hour before hand. The deadlocks where part occuring b/c someone decided to do a large delete on the backupset.

    Now, I understand the whole deadlocking stuff and can find a ton of information. What I cannot figure out is why this made the entire server crawl. As I understand it, the MSDB database is responsible for Backup histories, Database Mails, SQL Agent Jobs, Replication, and Server Broker that is not contained in the individual database.

    We don't use replication, although we did at one time, and we don't use service broker.

    What I do know is that when we killed the hung SPID that was causing all of the deadlocks in the MSDB database (sp_delete_jobhistory), the server went back to it's normal, "I'm a 8 year old boy who just drank a 6 pack of Mountain Dew!!!!".

    Thoughts/Idea?

    Thanks,

    Fraggle

  • Are you sure you were encountering deadlocks? and not just significant blocking?

    Deleting large swaths of job history can wreak havoc on your server if you have a lot of jobs running and completing concurrently on a regular basis. It is recommended to delete your job history often to avoid having to delete too much at one time, and deleting it during low utilization times.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Yes, I am sure on the deadlocks. We have them written to the error log on a regular basis.

    How much are we talking here. I have a few million records in the backup history. This delete was from dropping a single database.

    Fraggle

  • What you may want to do is chunk it out until you catch up. How many days worth of job history do you need to delete?

    sp_purge_jobhistory accepts a parameter to limit how much would be deleted, @oldest_date. Find out the oldest entry, and then call the proc for 7 days after the oldest entry to limit the impact. Do that until you're caught up to the level you care to keep job history. Then you can schedule the proc to run once per week (what I do on all my instances) to delete anything older than your retention period.

    I manage my job history manually using a job, as described above, so I disable the auto-management provided in SQL Agent:

    USE [msdb]

    GO

    EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows=-1,

    @jobhistory_max_rows_per_job=-1;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I didn't really need to delete anything. That was an accident by one my co-workers when they dropped a database.

    The question I am trying to figure out here is why the server when to hell in a hand basket because there was a deadlock on the MSDB database.

    Fraggle

  • Unfortunately, we can't see you error logs. Can you post a part of the deadlock errors you see in the logs?

  • I doubt the deadlock is the cause. More likely, the deadlocks are another effect.

    Deleting lots of rows can be a very expensive query. If it parallels and uses lots of CPUs, then that delete alone could slow the server down, especially if it's also hammering the log and growing the log.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lynn Pettis (5/1/2012)


    Unfortunately, we can't see you error logs. Can you post a part of the deadlock errors you see in the logs?

    Sure, here is one of the chains:

    deadlock-list

    deadlock victim=process6bc4cc5c8

    process-list

    process id=process70b3048 taskpriority=0 logused=308690192 waitresource=PAGE: 4:1:1664936 waittime=218 ownerId=4372116072 transactionname=user_transaction lasttranstarted=2012-04-30T16:57:03.077 XDES=0xe523a6370 lockMode=S schedulerid=7 kpid=2244 status=suspended spid=253 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2012-04-30T16:56:54.383 lastbatchcompleted=2012-04-30T16:56:54.383 clientapp=Microsoft SQL Server Management Studio hostname=DB10 hostpid=5028 loginname=BDTPROD\russ.peebles isolationlevel=read committed (2) xactid=4372116072 currentdb=4 lockTimeout=10000 clientoption1=538968096 clientoption2=128056

    executionStack

    frame procname=msdb.dbo.sp_delete_database_backuphistory line=65 stmtstart=3824 stmtend=4408 sqlhandle=0x030004007488435bd6aa2100ed9600000100000000000000

    DELETE msdb.dbo.backupmediafamily

    FROM msdb.dbo.backupmediafamily bmf

    WHERE bmf.media_set_id IN (SELECT media_set_id

    FROM @media_set_id)

    AND ((SELECT COUNT(*)

    FROM msdb.dbo.backupset

    WHERE media_set_id = bmf.media_set_id) = 0)

    frame procname=adhoc line=1 sqlhandle=0x0100010071df4510002defb4000000000000000000000000

    EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'

    inputbuf

    EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'

    process id=process6bc4cc5c8 taskpriority=0 logused=4204 waitresource=OBJECT: 4:2050106344:0 waittime=1375 ownerId=4555064220 transactionname=Backup/Restore History lasttranstarted=2012-04-30T18:56:17.007 XDES=0x83bc9d020 lockMode=IX schedulerid=1 kpid=4552 status=suspended spid=96 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2012-04-30T18:56:16.790 lastbatchcompleted=2012-04-30T18:56:16.790 clientapp=LiteSpeed for SQL Server {abf67ae5-2bcd-4d39-97cf-b9573cde946b} hostname=DB10 hostpid=5720 loginname=BDTPROD\sqlservice isolationlevel=read committed (2) xactid=4555064220 currentdb=4 lockTimeout=4294967295 clientoption1=673253472 clientoption2=324664

    executionStack

    frame procname=adhoc line=1 sqlhandle=0x020000003ea88913766540781c32a91433ea1331a0f33b95

    (@param0 nvarchar(20), @param1 nvarchar(42), @param2 nvarchar(42), @param3 nvarchar(42), @param4 nvarchar(42), @param5 nvarchar(42), @param6 nvarchar(42), @param7 nvarchar(42), @param8 nvarchar(23), @param9 nvarchar(47), @param10 nvarchar(18), @param11 nvarchar(9), @param12 nvarchar(4), @param13 nvarchar(4), @param14 nvarchar(28), @param15 nvarchar(7), @param16 nvarchar(7), @param17 nvarchar(9), @param18 nvarchar(3), @param19 nvarchar(26), @param20 nvarchar(8), @param21 nvarchar(3), @param22 nvarchar(25), @param23 nvarchar(7), @param24 nvarchar(32), @param25 nvarchar(3), @param26 nvarchar(38), @param27 nvarchar(7), @param28 nvarchar(28), @param29 nvarchar(3), @param30 nvarchar(34))

    declare @backup_set_id int

    declare @media_set_id int

    declare @media_count int

    select @media_set_id = media_set_id from msdb.dbo.backupmediaset where media_uuid = N'{57E33AFD-7A18-4438-B4EC-9088151A69F0}'

    if @media_set_id is null begin

    insert msdb.dbo.backupmediaset (media_uuid, media_family_count, software_name, software_vendor

    frame procname=adhoc line=1 sqlhandle=0x010001006b483212000ae7b9070000000000000000000000

    BACKUP log [SimplyYou] TO VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_0', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_1', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_2', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_3', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_4', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_5', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_6'with BLOCKSIZE=65536, BUFFERCOUNT=70, CHECKSUM, DESCRIPTION=N'Backup log of SimplyYou on 4/30/2012 6:50:14 PM', MAXTRANSFERSIZE=1048576, NAME=N'Backup log of SimplyYou'

    inputbuf

    BACKUP log [SimplyYou] TO VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_0', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_1', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_2', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_3', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_4', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_5', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_6'with BLOCKSIZE=65536, BUFFERCOUNT=70, CHECKSUM, DESCRIPTION=N'Backup log of SimplyYou on 4/30/2012 6:50:14 PM', MAXTRANSFERSIZE=1048576, NAME=N'Backup log of SimplyYou'

    resource-list

    pagelock fileid=1 pageid=1664936 dbid=4 objectname=msdb.dbo.backupset id=lock6f97ed200 mode=IX associatedObjectId=72057594043039744

    owner-list

    owner id=process6bc4cc5c8 mode=IX

    waiter-list

    waiter id=process70b3048 mode=S requestType=wait

    objectlock lockPartition=0 objid=2050106344 subresource=FULL dbid=4 objectname=msdb.dbo.backupfile id=lock7fb8ed300 mode=X associatedObjectId=2050106344

    owner-list

    owner id=process70b3048 mode=X

    waiter-list

    waiter id=process6bc4cc5c8 mode=IX requestType=wait

    2012-04-30 18:56:21.30 spid345 Error: Maintenance plan 'Backups [TL]'.LiteSpeed(R) for SQL Server Version 6.5.1.1018

    Copyright 2011 Quest Software, Inc.

    Msg 62301, Level 16, State 1, Line 2883584: SQL Server has returned a failure message to LiteSpeed for SQL Server which has prevented the operation from succeeding.

    The following message is not a LiteSpeed for SQL Server message. Please refer to SQL Server books online or Microsoft technical support for a solution:

    Processed 85 pages for database 'SimplyYou', file 'jeff_Log' on file 1.

    BACKUP LOG successfully processed 85 pages in 0.167 seconds (4.138 MB/sec).

    Could not insert a backup or restore history/detail record in the msdb database. This may indicate a problem with the msdb database. The backup/restore operation was still successful.

    Transaction (Process ID 96) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

    2012-04-30 18:56:37.69 spid345 Error: Maintenance plan 'Backups [TL]'.LiteSpeed(R) for SQL Server Version 6.5.1.1018

    Copyright 2011 Quest Software, Inc.

    Msg 60901, Level 16, State 1, Line 2883584: Failed to login to SQL Server

  • GilaMonster (5/1/2012)


    I doubt the deadlock is the cause. More likely, the deadlocks are another effect.

    Deleting lots of rows can be a very expensive query. If it parallels and uses lots of CPUs, then that delete alone could slow the server down, especially if it's also hammering the log and growing the log.

    CPU was at 20-30% when I got on the server last night. No disk queue issues (0-1). < 5 ms on disk read/write.

    MSDB database MDF is 13.5 GB. Log is roughly 1.5GB.

    There was a single automattic log growth on approximately 1 hour before we started running into issues on the servers on the MSDB database. No other database on the server (roughly 60 total) experiences any LDF or MDF growth issues.

    Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. It doesn't make sense to me.

    Fraggle

  • OK, I see now.

    EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'

    Backup history is deleted when a database is dropped. Originally I thought we were talking job history...for backup history it is recommended that you run sp_delete_backuphistory periodically to keep the backup history tables at a manageable size. You can follow the same job history technique I outline above for backup history using sp_delete_backuphistory.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Fraggle-805517 (5/1/2012)


    Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. I doesn't make sense to me.

    When you stopped the delete, the problems resolved themselves. That suggests very strongly that the delete was the problem and the deadlocks were problems caused by the delete, as was the slow performance.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • "MSDB database MDF is 13.5 GB" suggests it may need some attention. backup history, job history and database mail logs are common sources of msdb bloat.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • GilaMonster (5/1/2012)


    Fraggle-805517 (5/1/2012)


    Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. I doesn't make sense to me.

    When you stopped the delete, the problems resolved themselves. That suggests very strongly that the delete was the problem and the deadlocks were problems caused by the delete, as was the slow performance.

    It does lend to that, and I won't deny it. However, when I have seen stuff like this in the past, I have always seens disk queues rise along with slower read/writes per sec on perfmon. This time, I wasn't getting any of that. That is what is bugging me.

    Fraggle

  • opc.three (5/1/2012)


    "MSDB database MDF is 13.5 GB" suggests it may need some attention. backup history, job history and database mail logs are common sources of msdb bloat.

    General question. Is there a recommended amount of time for keeping these? 6 months? 1 year? etc?

    Fraggle

  • How long do you need them for (audits, restore scripts, etc)?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply