March 14, 2006 at 5:21 am
Anyone that can shed some light over these messages (from errorlog) regarding problems performing TRN-backups. Happens from time to time and on different databases.
Sequence of messages:
1) BackupMedium::ReportIoError: write failure on backup device 'F:\Backup DB\xxx\xxx_tlog_200603131630.TRN'. Operating system error 2(error not found).
2) Internal I/O request 0x1ECDE858: Op: Write, pBuffer: 0x07890000, Size: 983040, Position: 1689974272, UMS: Internal: 0x103, InternalHigh: 0x0, Offset: 0x64BAF600, OffsetHigh: 0x0, m_buf: 0x07890000, m_len: 983040, m_actualBytes: 0, m_errcode: 2, BackupFile: F:\Backup DB\xxx\xxx_tlog_200603131630.TRN
3) BACKUP failed to complete the command BACKUP LOG [xxx] TO DISK = N'F:\Backup DB\xxx\xxx_tlog_200603131630.TRN' WITH INIT , NOUNLOAD , NOSKIP , STATS = 10, NOFORMAT
Writing to the backupdevice is clearly the problem, but I'm looking for some possible reasons. Available diskspace is not the issue since the actual volume has 55 Gb of free space reported in Windows Explorer. The underlying disksystem is a shared storage array (not SAN) and the actual volume is also used for most of the datafiles (several databases).
\hplu
March 15, 2006 at 3:56 am
Check bottlenecks when writing. Try to avoid two servers backing up just at the same time.
March 15, 2006 at 4:04 am
Should have mentioned this in my initial post, but this is a failover cluster in an active-passive configuration running SQL2k sp3a on Windows 2000 Server Ent.
Could the same happen if several agentjobs are execuring at the same time? It's only one server running at a time since this is an active-passive cluster. It could be a matter of contention if I understand You right - which performance counter in Perfomance Manager would be the best to document that contention is the reason?
\hplu
March 15, 2006 at 8:41 am
Check your antivirus configuration as well. It is possible that a scan-on-write action is causing problems, so you might need to configure an exclusion for TRN files.
jg
March 15, 2006 at 11:14 am
Could be:
1. antivurus checking the destination file when the backup job starts
2. antispyware running
3. another backup job backing up that file. Are your sysadmins/network admins running a backup job of all the files at that time?
4. Do you have other backups going to that file at the same time? For example, do you do full and log backups to the same backup file?
-SQLBill
March 23, 2006 at 8:10 am
Thank You for all input, but none of them is relevant to this installation since it's a very closed and dedicated environment (at a customer site).
The problem is solved right now, but I'll expect it to show up again. Without any knowledge about what's done to the environment, my best theory is either errors on physical disks (Raid 5 volume) that has been corrected, contention (disk i/o, controller) OR some lack of logic in the deletion of old backupfiles (in sqlmaint utility). I have Googeled a lot around this issue and found a couple of similar posts, but all without a explanation and/or solution.
\hplu
March 29, 2006 at 7:31 am
March 29, 2006 at 10:29 am
Stay tuned; low level physical checks are running as this post is written and this will hopefully shed some light over the mysterious messages. If time permit it, chkdsk will also be run against a couple of volumes to verify the filesystem. We suspect the reason to be some kind of corruption. Another update is that this system is running with raid-controllers local to each node with shared storage (disk shelf). The firmware for the controllers and the corresponding drivers (IBM ServerRaid) was updated earlier this week, but this did'nt solve the problems. We now see some warnings in eventlog from ServeRaid Manager stating bad stripesets, so we are making progress...
\hplu
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply