April 12, 2012 at 5:51 am
Hi all,
Over the last few weeks 3 of our secondary (log-shipped) DB's have been marked 'Suspect', requiring drop+restore. I've been advised to check the I/O and try to faultfind.
What practices/native tools exist for SS2K to get started on the investigation? BTW, if the initial diagnosis involves creating non-temp tables/objects, I would rather avoid this as even making slight changes involves having to raise an RFC.
Also, would you recommend checking I/O on both Primary + Secondary servers?
April 12, 2012 at 6:19 am
If the secondary has gone suspect and the primary is fine, then it's the secondary's IO subsystem that's the problem.
Start with the windows error log, any RAID logs, SAN logs. If you can, stop SQL on there and run SQLIOSim (I wouldn't run it with SQL running, too much load)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 12, 2012 at 10:40 am
Hi Gail,
I've done some Perfmon analysis during the 100 seconds after which log shipping runs (every 15mins on the hour), only the logical disk today (physical tomorrow) but the results for the W: drive to which the logs are copied (and restored from) are as follows, I presume the values are milliseconds:
Avg Disk Bytes/Read:
- Avg = 18,199
- Max = 26021
Avg Disk Bytes/Transfer:
- Avg = 40,651
- Max = 65,536
Avg Disk Bytes/Write:
- Avg = 53,696
- Max = 65,536
April 12, 2012 at 11:43 am
Perfmon is not the place to look, you don't have disk performance problems, you have disk stability problems.
And no, the figure for bytes/write is not milliseconds. It's bytes. It shows the average number of bytes written per second.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 13, 2012 at 3:24 am
GilaMonster (4/12/2012)
Perfmon is not the place to look, you don't have disk performance problems, you have disk stability problems.
Agreed, but I don't have a lot of immediate avenues of investigation left, so I was reaching. The event log (app/systeM) showed nothing suspicious around or immediately before the initial failure. We don't have the SAN/RAID guys in until Monday, and stopping the SQL service, even temporarily on Secondary, will require a bunch of form-filling. Ok, actually swapping the disk out is not a lengthy procedure, but I need to make a business case for the switch, and thus need proof the disk is not quite stable.
April 13, 2012 at 3:26 am
Nothing in any of the error logs?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 13, 2012 at 4:12 am
Hunted around but couldn't make much sense of it...
Error: 5180, Severity: 22, State: 1
Could not open FCB for invalid file ID 0 in database 'XXXXXXXXXXXXX'
April 13, 2012 at 4:14 am
What about the windows event logs?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 13, 2012 at 4:30 am
GilaMonster (4/13/2012)
What about the windows event logs?
Zip. The app log filled up with infomercials and doesn't stretch back that far. However I DID check it on the morning in question (the 11th) and found nothing. The only other 'critical' error was in te System log, a virtual disk service error, about 8 hours before and after the restore job failed:
"Unexpected failure. Error code: 2@0200001D"
April 24, 2012 at 10:47 am
Any further thoughts, anyone?
April 24, 2012 at 10:50 am
Both of the errors you've listed indicate there's some form of disk problem. Maybe contact the SAN vendor (assuming it's a SAN) and get them to check it out.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 25, 2012 at 9:08 am
SQLIOSIM is the tool to use to validate that an IO subsystem will properly handle SQL Server IO-style workloads.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 25, 2012 at 9:14 am
http://support.microsoft.com/default.aspx?scid=kb;en-us;815183
the error Could not open FCB for invalid file ID %d in database '%.*ls'. is know to cause data corruption, thread errors and runtime errors
are there different service pack versions on the shipper and shipee?
MVDBA
April 25, 2012 at 9:16 am
TheSQLGuru (4/25/2012)
SQLIOSIM is the tool to use to validate that an IO subsystem will properly handle SQL Server IO-style workloads.
But SQL needs to be stopped when running that. The aim is to validate the IO subsystem, nor slaughter it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 25, 2012 at 9:16 am
by the way -
FCB stands for File Control Block, the physical file structure used
by SQL to write in and read data from the storage.
i've had these before when i defragged a database and the log shipping made the same changes on the target.
MVDBA
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply