August 23, 2013 at 3:38 pm
We lost our SAN today while EMC was configuring Recoverpoint to our DR location. SQL just dropped, OS lost the drives...everything went down. The tech Services guys got the SAN back online and were able to get all of the LUNs back but one...
The DB's that came back up went into recovery with the message:
Error: 824, Severity: 24, State: 2.
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:29898565; actual 0:0). It occurred during a read of page (1:29898565) in database ID 15 at offset 0x00003906e8a000 in file 'K:\MYDB.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
I tried:
DBCC CheckDB ('MYDB') WITH NO_INFOMSGS, ALL_ERRORMSGS
But immediately get this error:
Msg 7929, Level 16, State 1, Line 1
Check statement aborted. Database contains deferred transactions.
Any "expert" suggestions on getting these DB's back online without losing data (or than a complete restore of Full, Diff, and log shipped transaction logs?
Please help!!!
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 23, 2013 at 3:44 pm
I'm assuming that the message "Check statement aborted. Database contains deferred transactions." means that SQL is still either rolling forward/rolling back transactions and I just need to wait?
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 23, 2013 at 3:54 pm
Restore from backup.
Deferred transactions - transactions that can't be rolled forward/back because of an offline file/filegroup. I would guess you have one or more files in the offline state (or maybe recovery pending), ones on that LUN that couldn't be recovered.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 23, 2013 at 3:58 pm
So there is no other way? We are already restoring the DB that came from the lost lun
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 23, 2013 at 3:59 pm
The databases that are in suspect are on LUNS that came back just fine
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 23, 2013 at 4:01 pm
Might be able to hack something out with enough work and fiddling that might allow for repair to throw data away, but restore from backup should be one of the first plans for corruption in most cases, with more creative solutions for when there are no backups.
Do you have recovery_pending or offline files? If so, then there's no choice but restore unless you want to lose everything in those files.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 23, 2013 at 5:40 pm
I reset the status of the database in question and then placed it in emergency mode to run a checkdb with just error-msgs and got back thousands of errors (7MB worth in a text file so I won't post it here).
My boss wanted to try the REPAIR_ALLOW_DATA_LOSS first before restoring from the backup and we tried that...it ran for 25 mins when I was instructed to kill it. It's been rolling back for the past hour.
EXEC sp_resetstatus 'COREAUTH';
ALTER DATABASE COREAUTH SET EMERGENCY
ALTER DATABASE COREAUTH SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DBCC CheckDB ('COREAUTH', REPAIR_ALLOW_DATA_LOSS)
ALTER DATABASE COREAUTH SET MULTI_USER
We're obviously going to go ahead and just restore from the backups, but I can't do anything until this rollback completes...it's taking forever and we don't need to worry about it anymore as we're going to blow away the DB with the recovery process, can i force this to stop somehow by restarting the services or will i still have to wait for the automatic checkdb to complete upon the service start up
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 24, 2013 at 3:09 am
MyDoggieJessie (8/23/2013)
My boss wanted to try the REPAIR_ALLOW_DATA_LOSS first before restoring from the backup and we tried that...it ran for 25 mins when I was instructed to kill it.
So he wants to try the last resort, that may fail, that will likely leave your database transactionally inconsistent, that will lose an unknown amount of data before the solution that will recover with an intact database at a known point in time?
Maybe point him at my corruption article, the part that says 'don't just run repair'...
There's a reason why I react 'No!No!No!' to people suggesting that. It should almost never be the first thing you try or the default action. Especially when you have as much damage as you have.
We're obviously going to go ahead and just restore from the backups, but I can't do anything until this rollback completes...it's taking forever and we don't need to worry about it anymore as we're going to blow away the DB with the recovery process, can i force this to stop somehow by restarting the services or will i still have to wait for the automatic checkdb to complete upon the service start up
There's no automatic CheckDB on startup.
If you want to restore, stop SQL, delete the database's files, restart SQL, start your restore.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2013 at 4:49 am
Yes, that's exactly the route that was chosen (even after explaining the consequences, and the possibilities of how much time/unknown data we could lose in the process).
Turns out that didn't work and I had to use your original advice, restore from our last point-in-time transaction backup. Throughout the night we've recovered all databases so far, and are running thelast CHECKDB - if that clears without errors I will thank my lucky stars and see if we can bring our production environment back up to connect to the databases.
There's no automatic CheckDB on startup.
Whenever I look at the SQL logs after a service restart I see that every database has had a checkdb run against it by a system spid ... looks something like:
Starting up database 'X'.
Recovery is writing a checkpoint in database 'X' (10). This is an informational message only. No user action is required.
CHECKDB for database 'X' finished without errors on 2013-08-18 18:05:19.123 (local time). This is an informational message only; no user action is required
After that the service runs custom scripts (creating trace files, etc), then eventually everything comes up (pending no errors) - how would you best describe this if it's not an automatic CHECKDB?
Thanks for your help Gail, it is sincerely appreciated
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 24, 2013 at 5:48 am
MyDoggieJessie (8/24/2013)
Whenever I look at the SQL logs after a service restart I see that every database has had a checkdb run against it by a system spid ...
Nope. How long does a CheckDB take and how long does startup take? That alone should tell you that there's no checkDB running, startup does not take hours....
Starting up database 'X'.
Recovery is writing a checkpoint in database 'X' (10). This is an informational message only. No user action is required.
CHECKDB for database 'X' finished without errors on 2013-08-18 18:05:19.123 (local time). This is an informational message only; no user action is required
Look at the datetime stamps for the log entry and the time it list CheckDB having finished at.
From a database on my server (emphasis mine)
Date2013/08/21 14:57:35
LogSQL Server (Current - 2013/08/24 00:00:00)
Sourcespid20s
Message
CHECKDB for database 'Testing' finished without errors on 2012-08-28 22:10:11.457 (local time).
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2013 at 6:32 am
Now I think I understand...
I see the datetimestamps are completely different...so it's basically reading it from the last known outcome of the prior CHECKDB that was successful.
I always thought it was bizarre how this "CHECKDB" ran so fast on startup, while the actual one you run can take hours - makes much more sense now - Again, thank-you for your patience! 🙂
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 24, 2013 at 7:50 am
MyDoggieJessie (8/24/2013)
so it's basically reading it from the last known outcome of the prior CHECKDB that was successful.
Correct. It's printing out the date and time last known good CheckDB run. In the case of the one I listed, I haven't run a CheckDB on that database in a year (it is a test database)
That info is stored in the database's boot page and printed out whenever SQL brings the DB online (service start, restore, attach, set online)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 26, 2013 at 8:55 pm
Just in case you were curious, all databases were restored successfully without losing any data - it took 14 hours to get them all back and to verify them with CHECKDB's - but in the end, it went smoothly.
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply