Write failures on backupdevice for TRN-backups

Question

Post reply

Write failures on backupdevice for TRN-backups

hplu-243427

SSC Enthusiast

Points: 161
More actions
March 14, 2006 at 5:21 am

#113740

Anyone that can shed some light over these messages (from errorlog) regarding problems performing TRN-backups. Happens from time to time and on different databases.
Sequence of messages:
1) BackupMedium::ReportIoError: write failure on backup device 'F:\Backup DB\xxx\xxx_tlog_200603131630.TRN'. Operating system error 2(error not found).
2) Internal I/O request 0x1ECDE858: Op: Write, pBuffer: 0x07890000, Size: 983040, Position: 1689974272, UMS: Internal: 0x103, InternalHigh: 0x0, Offset: 0x64BAF600, OffsetHigh: 0x0, m_buf: 0x07890000, m_len: 983040, m_actualBytes: 0, m_errcode: 2, BackupFile: F:\Backup DB\xxx\xxx_tlog_200603131630.TRN
3) BACKUP failed to complete the command BACKUP LOG [xxx] TO DISK = N'F:\Backup DB\xxx\xxx_tlog_200603131630.TRN' WITH INIT , NOUNLOAD , NOSKIP , STATS = 10, NOFORMAT
Writing to the backupdevice is clearly the problem, but I'm looking for some possible reasons. Available diskspace is not the issue since the actual volume has 55 Gb of free space reported in Windows Explorer. The underlying disksystem is a shared storage array (not SAN) and the actual volume is also used for most of the datafiles (several databases).
\hplu

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jose Manuel Fuentes SSC Enthusiast Points: 144 More actions · Answer 1

Check bottlenecks when writing. Try to avoid two servers backing up just at the same time.

hplu-243427 SSC Enthusiast Points: 161 More actions · Answer 2

Should have mentioned this in my initial post, but this is a failover cluster in an active-passive configuration running SQL2k sp3a on Windows 2000 Server Ent.

Could the same happen if several agentjobs are execuring at the same time? It's only one server running at a time since this is an active-passive cluster. It could be a matter of contention if I understand You right - which performance counter in Perfomance Manager would be the best to document that contention is the reason?

\hplu

Jeff Gray SSChampion Points: 10667 More actions · Answer 3

Check your antivirus configuration as well. It is possible that a scan-on-write action is causing problems, so you might need to configure an exclusion for TRN files.

jg

SQLBill SSC Guru Points: 51440 More actions · Answer 4

Could be:

1. antivurus checking the destination file when the backup job starts

2. antispyware running

3. another backup job backing up that file. Are your sysadmins/network admins running a backup job of all the files at that time?

4. Do you have other backups going to that file at the same time? For example, do you do full and log backups to the same backup file?

-SQLBill

hplu-243427 SSC Enthusiast Points: 161 More actions · Answer 5

Thank You for all input, but none of them is relevant to this installation since it's a very closed and dedicated environment (at a customer site).

The problem is solved right now, but I'll expect it to show up again. Without any knowledge about what's done to the environment, my best theory is either errors on physical disks (Raid 5 volume) that has been corrected, contention (disk i/o, controller) OR some lack of logic in the deletion of old backupfiles (in sqlmaint utility). I have Googeled a lot around this issue and found a couple of similar posts, but all without a explanation and/or solution.

\hplu

sivaprasad SSCertifiable Points: 6183 More actions · Answer 6

Hi,

I too, facing this similar problem. But googled have not found any explanation or solution.

~ Sivaprasad S

Sivaprasad S - [ SIVA ][/url]http://sivasql.blogspot.com/[/url]

hplu-243427 SSC Enthusiast Points: 161 More actions · Answer 7

Stay tuned; low level physical checks are running as this post is written and this will hopefully shed some light over the mysterious messages. If time permit it, chkdsk will also be run against a couple of volumes to verify the filesystem. We suspect the reason to be some kind of corruption. Another update is that this system is running with raid-controllers local to each node with shared storage (disk shelf). The firmware for the controllers and the corresponding drivers (IBM ServerRaid) was updated earlier this week, but this did'nt solve the problems. We now see some warnings in eventlog from ServeRaid Manager stating bad stripesets, so we are making progress...

\hplu