May 1, 2012 at 10:14 am
Hey everyone,
Last night about 7 p.m. EST our wonderful server decided to started doing the Watermellon Crawl (Country song if you don't know it).
The only thing that we could find was that the in the MSDB, we were getting a ton of deadlocks that started about an hour before hand. The deadlocks where part occuring b/c someone decided to do a large delete on the backupset.
Now, I understand the whole deadlocking stuff and can find a ton of information. What I cannot figure out is why this made the entire server crawl. As I understand it, the MSDB database is responsible for Backup histories, Database Mails, SQL Agent Jobs, Replication, and Server Broker that is not contained in the individual database.
We don't use replication, although we did at one time, and we don't use service broker.
What I do know is that when we killed the hung SPID that was causing all of the deadlocks in the MSDB database (sp_delete_jobhistory), the server went back to it's normal, "I'm a 8 year old boy who just drank a 6 pack of Mountain Dew!!!!".
Thoughts/Idea?
Thanks,
Fraggle
May 1, 2012 at 12:06 pm
Are you sure you were encountering deadlocks? and not just significant blocking?
Deleting large swaths of job history can wreak havoc on your server if you have a lot of jobs running and completing concurrently on a regular basis. It is recommended to delete your job history often to avoid having to delete too much at one time, and deleting it during low utilization times.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 1, 2012 at 12:19 pm
Yes, I am sure on the deadlocks. We have them written to the error log on a regular basis.
How much are we talking here. I have a few million records in the backup history. This delete was from dropping a single database.
Fraggle
May 1, 2012 at 12:33 pm
What you may want to do is chunk it out until you catch up. How many days worth of job history do you need to delete?
sp_purge_jobhistory accepts a parameter to limit how much would be deleted, @oldest_date. Find out the oldest entry, and then call the proc for 7 days after the oldest entry to limit the impact. Do that until you're caught up to the level you care to keep job history. Then you can schedule the proc to run once per week (what I do on all my instances) to delete anything older than your retention period.
I manage my job history manually using a job, as described above, so I disable the auto-management provided in SQL Agent:
USE [msdb]
GO
EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows=-1,
@jobhistory_max_rows_per_job=-1;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 1, 2012 at 1:05 pm
I didn't really need to delete anything. That was an accident by one my co-workers when they dropped a database.
The question I am trying to figure out here is why the server when to hell in a hand basket because there was a deadlock on the MSDB database.
Fraggle
May 1, 2012 at 1:08 pm
Unfortunately, we can't see you error logs. Can you post a part of the deadlock errors you see in the logs?
May 1, 2012 at 1:09 pm
I doubt the deadlock is the cause. More likely, the deadlocks are another effect.
Deleting lots of rows can be a very expensive query. If it parallels and uses lots of CPUs, then that delete alone could slow the server down, especially if it's also hammering the log and growing the log.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 1, 2012 at 1:21 pm
Lynn Pettis (5/1/2012)
Unfortunately, we can't see you error logs. Can you post a part of the deadlock errors you see in the logs?
Sure, here is one of the chains:
deadlock-list
deadlock victim=process6bc4cc5c8
process-list
process id=process70b3048 taskpriority=0 logused=308690192 waitresource=PAGE: 4:1:1664936 waittime=218 ownerId=4372116072 transactionname=user_transaction lasttranstarted=2012-04-30T16:57:03.077 XDES=0xe523a6370 lockMode=S schedulerid=7 kpid=2244 status=suspended spid=253 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2012-04-30T16:56:54.383 lastbatchcompleted=2012-04-30T16:56:54.383 clientapp=Microsoft SQL Server Management Studio hostname=DB10 hostpid=5028 loginname=BDTPROD\russ.peebles isolationlevel=read committed (2) xactid=4372116072 currentdb=4 lockTimeout=10000 clientoption1=538968096 clientoption2=128056
executionStack
frame procname=msdb.dbo.sp_delete_database_backuphistory line=65 stmtstart=3824 stmtend=4408 sqlhandle=0x030004007488435bd6aa2100ed9600000100000000000000
DELETE msdb.dbo.backupmediafamily
FROM msdb.dbo.backupmediafamily bmf
WHERE bmf.media_set_id IN (SELECT media_set_id
FROM @media_set_id)
AND ((SELECT COUNT(*)
FROM msdb.dbo.backupset
WHERE media_set_id = bmf.media_set_id) = 0)
frame procname=adhoc line=1 sqlhandle=0x0100010071df4510002defb4000000000000000000000000
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'
inputbuf
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'
process id=process6bc4cc5c8 taskpriority=0 logused=4204 waitresource=OBJECT: 4:2050106344:0 waittime=1375 ownerId=4555064220 transactionname=Backup/Restore History lasttranstarted=2012-04-30T18:56:17.007 XDES=0x83bc9d020 lockMode=IX schedulerid=1 kpid=4552 status=suspended spid=96 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2012-04-30T18:56:16.790 lastbatchcompleted=2012-04-30T18:56:16.790 clientapp=LiteSpeed for SQL Server {abf67ae5-2bcd-4d39-97cf-b9573cde946b} hostname=DB10 hostpid=5720 loginname=BDTPROD\sqlservice isolationlevel=read committed (2) xactid=4555064220 currentdb=4 lockTimeout=4294967295 clientoption1=673253472 clientoption2=324664
executionStack
frame procname=adhoc line=1 sqlhandle=0x020000003ea88913766540781c32a91433ea1331a0f33b95
(@param0 nvarchar(20), @param1 nvarchar(42), @param2 nvarchar(42), @param3 nvarchar(42), @param4 nvarchar(42), @param5 nvarchar(42), @param6 nvarchar(42), @param7 nvarchar(42), @param8 nvarchar(23), @param9 nvarchar(47), @param10 nvarchar(18), @param11 nvarchar(9), @param12 nvarchar(4), @param13 nvarchar(4), @param14 nvarchar(28), @param15 nvarchar(7), @param16 nvarchar(7), @param17 nvarchar(9), @param18 nvarchar(3), @param19 nvarchar(26), @param20 nvarchar(8), @param21 nvarchar(3), @param22 nvarchar(25), @param23 nvarchar(7), @param24 nvarchar(32), @param25 nvarchar(3), @param26 nvarchar(38), @param27 nvarchar(7), @param28 nvarchar(28), @param29 nvarchar(3), @param30 nvarchar(34))
declare @backup_set_id int
declare @media_set_id int
declare @media_count int
select @media_set_id = media_set_id from msdb.dbo.backupmediaset where media_uuid = N'{57E33AFD-7A18-4438-B4EC-9088151A69F0}'
if @media_set_id is null begin
insert msdb.dbo.backupmediaset (media_uuid, media_family_count, software_name, software_vendor
frame procname=adhoc line=1 sqlhandle=0x010001006b483212000ae7b9070000000000000000000000
BACKUP log [SimplyYou] TO VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_0', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_1', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_2', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_3', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_4', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_5', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_6'with BLOCKSIZE=65536, BUFFERCOUNT=70, CHECKSUM, DESCRIPTION=N'Backup log of SimplyYou on 4/30/2012 6:50:14 PM', MAXTRANSFERSIZE=1048576, NAME=N'Backup log of SimplyYou'
inputbuf
BACKUP log [SimplyYou] TO VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_0', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_1', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_2', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_3', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_4', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_5', VIRTUAL_DEVICE='VDI_ABF67AE5-2BCD-4D39-97CF-B9573CDE946B_6'with BLOCKSIZE=65536, BUFFERCOUNT=70, CHECKSUM, DESCRIPTION=N'Backup log of SimplyYou on 4/30/2012 6:50:14 PM', MAXTRANSFERSIZE=1048576, NAME=N'Backup log of SimplyYou'
resource-list
pagelock fileid=1 pageid=1664936 dbid=4 objectname=msdb.dbo.backupset id=lock6f97ed200 mode=IX associatedObjectId=72057594043039744
owner-list
owner id=process6bc4cc5c8 mode=IX
waiter-list
waiter id=process70b3048 mode=S requestType=wait
objectlock lockPartition=0 objid=2050106344 subresource=FULL dbid=4 objectname=msdb.dbo.backupfile id=lock7fb8ed300 mode=X associatedObjectId=2050106344
owner-list
owner id=process70b3048 mode=X
waiter-list
waiter id=process6bc4cc5c8 mode=IX requestType=wait
2012-04-30 18:56:21.30 spid345 Error: Maintenance plan 'Backups [TL]'.LiteSpeed(R) for SQL Server Version 6.5.1.1018
Copyright 2011 Quest Software, Inc.
Msg 62301, Level 16, State 1, Line 2883584: SQL Server has returned a failure message to LiteSpeed for SQL Server which has prevented the operation from succeeding.
The following message is not a LiteSpeed for SQL Server message. Please refer to SQL Server books online or Microsoft technical support for a solution:
Processed 85 pages for database 'SimplyYou', file 'jeff_Log' on file 1.
BACKUP LOG successfully processed 85 pages in 0.167 seconds (4.138 MB/sec).
Could not insert a backup or restore history/detail record in the msdb database. This may indicate a problem with the msdb database. The backup/restore operation was still successful.
Transaction (Process ID 96) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
2012-04-30 18:56:37.69 spid345 Error: Maintenance plan 'Backups [TL]'.LiteSpeed(R) for SQL Server Version 6.5.1.1018
Copyright 2011 Quest Software, Inc.
Msg 60901, Level 16, State 1, Line 2883584: Failed to login to SQL Server
May 1, 2012 at 1:27 pm
GilaMonster (5/1/2012)
I doubt the deadlock is the cause. More likely, the deadlocks are another effect.Deleting lots of rows can be a very expensive query. If it parallels and uses lots of CPUs, then that delete alone could slow the server down, especially if it's also hammering the log and growing the log.
CPU was at 20-30% when I got on the server last night. No disk queue issues (0-1). < 5 ms on disk read/write.
MSDB database MDF is 13.5 GB. Log is roughly 1.5GB.
There was a single automattic log growth on approximately 1 hour before we started running into issues on the servers on the MSDB database. No other database on the server (roughly 60 total) experiences any LDF or MDF growth issues.
Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. It doesn't make sense to me.
Fraggle
May 1, 2012 at 1:28 pm
OK, I see now.
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'ActiveTravelNetwork'
Backup history is deleted when a database is dropped. Originally I thought we were talking job history...for backup history it is recommended that you run sp_delete_backuphistory periodically to keep the backup history tables at a manageable size. You can follow the same job history technique I outline above for backup history using sp_delete_backuphistory.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 1, 2012 at 1:36 pm
Fraggle-805517 (5/1/2012)
Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. I doesn't make sense to me.
When you stopped the delete, the problems resolved themselves. That suggests very strongly that the delete was the problem and the deadlocks were problems caused by the delete, as was the slow performance.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 1, 2012 at 1:40 pm
"MSDB database MDF is 13.5 GB" suggests it may need some attention. backup history, job history and database mail logs are common sources of msdb bloat.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
May 1, 2012 at 2:27 pm
GilaMonster (5/1/2012)
Fraggle-805517 (5/1/2012)
Gila, I tend to agree with you on the Deadlock as not being the issue. However, when I killed the SPID that was deleting the backup history, and it finished the rollback, the server came right back up. I doesn't make sense to me.When you stopped the delete, the problems resolved themselves. That suggests very strongly that the delete was the problem and the deadlocks were problems caused by the delete, as was the slow performance.
It does lend to that, and I won't deny it. However, when I have seen stuff like this in the past, I have always seens disk queues rise along with slower read/writes per sec on perfmon. This time, I wasn't getting any of that. That is what is bugging me.
Fraggle
May 1, 2012 at 2:28 pm
opc.three (5/1/2012)
"MSDB database MDF is 13.5 GB" suggests it may need some attention. backup history, job history and database mail logs are common sources of msdb bloat.
General question. Is there a recommended amount of time for keeping these? 6 months? 1 year? etc?
Fraggle
May 1, 2012 at 2:34 pm
How long do you need them for (audits, restore scripts, etc)?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply