August 9, 2011 at 4:07 pm
The Yeah! part is that this is a test db on a test server while testing a new san.
I've watched the whole presentation by Paul on checkdb and yet I've stumbled on a case he didn't talk about and that I can't figure out on my own.
I was developing a script to update a history table. Nothing big, maybe 95K rows, 45 MB.
The db has 30 GB in datafiles with 15 GB free and another 145 GB free on the drives.
Similar figures on tempdb, 100% free space in logs and data with 45 GB free space on the drive.
I'm saying this because the only thing I found on Paul Randal's blog about this error was about lack of space! => http://www.sqlskills.com/BLOGS/PAUL/post/Database-snapshots-when-things-go-wrong.aspx
What I do a start transaction, run 10 steps of a script, leave the tran opened and then keep developing for the final 2-3 steps. Which at that particular point was giving me primary key violations.
I highlight my rollback statement and I get and error message similar to this (sorry but I didn't save the original) :
Msg 926, Level 14, State 1, Line 1
Database 'Test Upgrade Dynamics' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.
Msg 3316, Level 21, State 2, Line 1
During undo of a logged operation in database 'Test Upgrade Dynamics', an error occurred at log record ID (101653:4848:329). The row was not found. Restore the database from a full backup, or repair the database.
Msg 3414, Level 21, State 2, Line 1
An error occurred during recovery, preventing the database 'Test Upgrade Dynamics' (database ID 9) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.
This is what I have in the sql server logs (nothing more in the Windows log) :
Date8/9/2011 4:56:36 PM
JournalSQL Server (Actuel(le) - 8/9/2011 5:42:00 PM)
Sourcespid54
Message
During undoing of a logged operation in database 'Test Upgrade Dynamics', an error occurred at log record ID (101653:4848:329). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
Awesome, I'll finally be able to test my "recovery" skills.
ALTER DATABASE [Test Upgrade Dynamics] SET EMERGENCY
DBCC CHECKDB('Test Upgrade Dynamics') WITH NO_INFOMSGS, ALL_ERRORMSGS
And that returns... NOTHING
Ok, then maybe the san or <whatever> caught the error and repaired it.
ALTER DATABASE [Test Upgrade Dynamics] SET ONLINE
Back to the same error message and db goes suspect.
AFAIK checkdb doesn't validate the log (not sure here so please advise).
I have the <brilliant> idea to run this to see if maybe the log is corrupt
91 seconds later I get 2.8M rows back and NO ERROR.
That's when I ran out of ideas and started this thread!
August 9, 2011 at 4:10 pm
Edit.
Link to Paul's amazing presentation : http://channel9.msdn.com/Events/TechEd/NorthAmerica/2008/DAT375
I have good backups and I can erase that db anytime I want without any problems. I'm doing this more for the experience and maybe helping someone else out in the future via google or whatknot.
Also while I have backups of prod DB, that particular db has never been backed up. It's in full recovery.
Good luck! 😉
August 9, 2011 at 4:16 pm
Ninja's_RGR'us (8/9/2011)
Also while I have backups of prod DB, that particular db has never been backed up. It's in full recovery
Which means it's in pseudo-simple recovery (log getting truncated on checkpoint) and hence a tail log backup won't work.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2011 at 4:18 pm
GilaMonster (8/9/2011)
Ninja's_RGR'us (8/9/2011)
Also while I have backups of prod DB, that particular db has never been backed up. It's in full recoveryWhich means it's in pseudo-simple recovery (log getting truncated on checkpoint) and hence a tail log backup won't work.
Exactly.
This not a a critical db by any means. It was supposed to be online for 1-2 days while testing migration scripts for the ERP. So it was not backed. Maybe another reason to ALWAYS backup, even for this case ;-).
I needed an extra sandbox to play with so I used this one... and you know the rest of the story.
August 9, 2011 at 4:19 pm
Ninja's_RGR'us (8/9/2011)
Awesome, I'll finally be able to test my "recovery" skills.
ALTER DATABASE [Test Upgrade Dynamics] SET EMERGENCY
DBCC CHECKDB('Test Upgrade Dynamics') WITH NO_INFOMSGS, ALL_ERRORMSGS
And that returns... NOTHING
Ok, then maybe the san or <whatever> caught the error and repaired it.
ALTER DATABASE [Test Upgrade Dynamics] SET ONLINE
Back to the same error message and db goes suspect.
Once you've set emergency you need CheckDB with repair_allow_data_loss (or other method of fixing cause of suspect) to go online. You can't just set online. CheckDB doesn't check the log, but there's something wrong somewhere (either in the log or the data file) that recovery is failing due to.
Gut feel here - checkDB repair allow data loss will rebuild the log and then you'll be able to bring the DB online, but may be transactionally inconsistent as the rebuild of the log would have discarded the information on uncommitted transactions and incomplete transactions
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2011 at 4:22 pm
p.s. you've either got some IO error, a memory error or you've hit a severe SQL bug. See if you can repo on completely different hardware, see if you can repo on same server, different drives (preferably local, not SAN) (first see if you can repo at all)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2011 at 4:27 pm
Darn. I wish this would have had more options to test.
Anyway I can backup the DB so we can retry a few things? I'm getting errors both in emergency and suspect mode that I can't back up.
August 9, 2011 at 4:29 pm
GilaMonster (8/9/2011)
p.s. you've either got some IO error, a memory error or you've hit a severe SQL bug. See if you can repo on completely different hardware, see if you can repo on same server, different drives (preferably local, not SAN) (first see if you can repo at all)
I can't not try this on san, this is a VM and the "local" drives are all on san.
I'll see if I can repro on our other servers (different san).
Any article your recommend for running the correct tests to figure out the hardware issue? As I said windows log was errm useless in this case.
August 9, 2011 at 4:33 pm
Vendor-specific hardware diagnostics.
I'd also run SQLIOSim and see if it picks up any funnies.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2011 at 4:33 pm
And in the news, Google stills works :pinch:!
I should start Let Me google that for me (tm). 😀
Took the server offline and copying the files. It's 50 GB so it'll take a while before I can run repair.
August 9, 2011 at 4:41 pm
There are hundreds of cases I didn't talk about in that presentation - but glad you liked it!
Sounds like memory corruption of a log block changing the slot number in a log record.
What are the backup errors you're getting?
How big is the database? (wondering if you can take if offline, zip it up and send it to me to play with)
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
August 9, 2011 at 4:46 pm
In other news... Paul Randal is psychic, or can smell a corrupt database from several thousand km away. 🙂
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2011 at 4:46 pm
Paul Randal (8/9/2011)
There are hundreds of cases I didn't talk about in that presentation - but glad you liked it!Sounds like memory corruption of a log block changing the slot number in a log record.
What are the backup errors you're getting?
How big is the database? (wondering if you can take if offline, zip it up and send it to me to play with)
I loved the presentation. Do you have another one besides the teched?
It says cannot take the backup in emergency/ suspect mode. Jonathan Keyas suggested taking the server offline and manual copy, which I'm running at the moment.
The file sizes are around 40 GB, but 50% empty so I could probably get it down to 2 GB with winrar. Send me a mail with ftp info and I'll send it over later tonight.
August 9, 2011 at 4:47 pm
GilaMonster (8/9/2011)
In other news... Paul Randal is psychic, or can smell a corrupt database from several thousand km away. 🙂
Nah, he said in the teched presentation that he monitors this specific forum (e-mail notifications). So I made sure to post in corruption to get the 2 best experts I know of :-D.
August 9, 2011 at 4:57 pm
There are hundreds of cases I didn't talk about in that presentation
No wonder it took you 5 years between 2k and 2k5 :hehe:.
How many cases do you have to contend for?
Viewing 15 posts - 1 through 15 (of 78 total)
You must be logged in to reply to this topic. Login to reply