February 11, 2004 at 6:35 am
Is anyone else using, or trying to use, an iSCSI SAN environment to store SQL database and transaction log files?
We're trying to do it where I work but we ran into a database corruption a few days ago and lost some data (wasn't a big deal, not important data) but I'm now weary of using this for any production database.
The worst part is that we have been unable to find a cause for the problem other than "there was an error" and we are waiting to see if it happens again.
-- J.T.
"I may not always know what I'm talking about, and you may not either."
February 11, 2004 at 9:44 pm
Hi there
I believe the iSCSI bit is just noise, not saying though a cabling issue may be your problem. I gather its a sql specific corruption identified by either the instance not coming back/suspect db's or results from a regular checkdb() run?
Is the san itself reporting issues around the caching space? the caches on san's can be huge, and I wonder if this area is an item of concern worth checking..
Cheers
Ck
Chris Kempster
www.chriskempster.com
Author of "SQL Server Backup, Recovery & Troubleshooting"
Author of "SQL Server 2k for the Oracle DBA"
February 12, 2004 at 5:57 am
What we saw was a large number of "Time out occurred while waiting for buffer latch" errors (in the SQL Log) followed by "The logfile is full" messages even though there was 97 GB free and the log was set to auto-expand. Later on we see I/O Error 1117 and SQL error 823 in the logs which points to some sort of disk problem. Being iSCSI, this becomes a networking problem as well. The problem is, we haven't found a cause other than "there was some kind of network problem".
This went on for about 3 days until SQL finally decided it had had enough and marked the DB as suspect. As they data isn't critical I decided to try some of my own troubleshooting at this point so I detached the DB (which gave an error, of course) and then reattached it.
I was told the log data wasn't in sync with the database checkpoint and that it could not be attached. This makes sense with the "logfile is full" messages we received, I guess.
I then renamed the logfile and reattached the database, letting SQL create a new, empty logfile. We lost about 21 GB of data we didn't care about (but would have been catastrophic had it been real data). Checkdb then ran clean.
This has made me leery of using the iSCSI in production...which is why I was asking if anyone else was doing it.
I will look into the caching aspect of the SAN environment with our SAN admin and see if they can make it smaller or turn it off though I fear that would hamper performance.
Thanks for the input!
-- J.T.
"I may not always know what I'm talking about, and you may not either."
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply