Corrupted DB Help Needed

  • OK, after a truly harrowing day yesterday, I was sitting in my lawyer's office for some more harrowing discussion (non-criminal and no, not in trouble), when the call comes in:  The website is down, the database server (remote, about 60 miles away) won't respond.

    He took the website offline (releasing the several hundred connections that were pending on that box - Solaris 8 running FreeTDS to connect to SQL Server 7.0), and could finally get to the server with Terminal Services.  Upon inspection of the error logs, we find a stack dump as well as Error 3314 and 3414.  In checking this site, I found a reference to Error 605 (no help it seems).

    I had him redirect the website traffic to a known, good, server (but not exactly "up to date" with data to be replicated from the source (and now corrupted) database.

    The short of it - it failed while trying to rollback about a million rows being inserted via a DTS copy.  They apparently hard-booted the server during this rollback and according to the stack dump, that's exactly where the logfile went down the tubes (here's the short version if the dump - spaces around some colons inserted intentionally to keep smileys out of it):

    MemoryLoad = 15%          TotalPhys = 2047 MB       AvailPhys = 1725 MB

    TotalPageFile = 4095 MB

    Dump thread - spid = 8, PSS = 0x4127e084, EC = 0x4127e234

    Stack Dump being sent to D:\MSSQL7\log\SQL00001.dmp

    Short Stack Dump

    0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)

    0x0078c44a Module(sqlservr+38c44a) (utassert_fail+1a0)

    0x005ad5ce Module(sqlservr+1ad5ce) (Page : : DeleteRow+38)

    0x0042eab7 Module(sqlservr+2eab7) (PageRef : : DeleteRows+367)

    0x005b756e Module(sqlservr+1b756e) (XDES::UndoPageOperation+124)

    0x004346b7 Module(sqlservr+346b7) (XDES::RollbackToLsn+229)

    0x005b4fc6 Module(sqlservr+1b4fc6) (RecoveryMgr::UndoPass+1c8)

    0x004c181b Module(sqlservr+c181b) (RecoveryMgr::RecoverDb+369)

    0x004ca910 Module(sqlservr+ca910) (DBTABLE::Startup+630)

    0x004c9eed Module(sqlservr+c9eed) (DBMgr::StartupDB+2c4)

    0x004d07dc Module(sqlservr+d07dc) (StartDBsInParallel+147)

    0x41092cff Module(ums+2cff) (ProcessWorkRequests+102)

    0x41092a6a Module(ums+2a6a) (FiberStartRoutine+ae)

    0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)

    0x7c571608 Module(KERNEL32+1608)

    To really mess things up, instead of doing the obvious (DBCC CHECKDB with options), they DETACHED the <expletive> files!

    Now I'm left with two database files (a data file and a log file), which cannot be attached either via sp_attach_db or sp_attach_single_file_db, and there's data in there that I need to get.  The logfile is apparently useless to update this database to a point where I can pull the affected transactions from the db and apply them to the stable one I built last night (and the wee hours of the morning).  The data loss is unacceptable, and I know that I'm going to be taking a bit of heat this afternoon.

    So - if anyone has some options I can try, to get these files reattached (with or without the log - I can whittle through the data and find what I need), I'd really be appreciative.  (No advice on what to do with the perpetrators of this monumental screw-up -- I have that well in hand... they will be mowing my 2 acre back yard with push mowers this weekend (just kidding)...

    Thanks in advance for any help anyone can provide!

     

    -- Joe

  • Gotta ask you:

    (a data file and a log file), which cannot be attached either via sp_attach_db or sp_attach_single_file_db,

    Why not? Did you get error messages when you tried it?

    -SQLBill

  • If you HAVEN'T tried this already:

    Rename the .ldf file (this is important) to something like old_dbname_log.ldf (just add old_  to the name).

    Then try the sp_attach_single_file_db command (or it's SQL7 equivelant).

    If you don't rename or move the .ldf, SQL will still try to attach it even when using the sp_attach_single_file_db command.

    -SQLBill

  • you've already been advised to use sp_attach_db ... (see bol)

    Then perform the checkdb

    Any backup-files ??? (full / diff / log)

    All ddl scripted nicely ?

    Review you backup-schema !

     

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hey SQLBill

    Here's the logs from ERRORLOG and error message (and I also renamed the logfile as you suggested).  "dbname" substituted for the actual database name.  Also, since a database of the same name is already on the server, I was trying to use a suffix of 'bad' on the dbname (see below).

    Command Issued:

    sp_attach_single_file_db 'dbname_bad', 'd:\mssql7\data\dbname_d1.mdf'

    Error Message from MSSQL/QA:

    Server: Msg 945, Level 14, State 2, Line 1

    Database 'dbname_bad' cannot be opened because some of the files could not be activated.

    Server: Msg 1813, Level 16, State 2, Line 1

    Could not open new database 'dbname_bad'. CREATE DATABASE is aborted.

    Logfile entries:

    2004-06-15 15:50:02.99 spid40   Opening file d:\mssql7\data\dbname_d1.mdf.

    2004-06-15 15:50:03.03 spid40   Closing file d:\mssql7\data\dbname_d1.mdf. 2004-06-15 15:50:03.10 spid40   Starting up database 'dbname_bad'.

    2004-06-15 15:50:03.10 spid40   Opening file d:\mssql7\data\dbname_d1.mdf.

    2004-06-15 15:50:03.18 spid40   File d:\mssql7\data\dbname_L1.ldf does not exist - unable to activate.

    2004-06-15 15:50:03.21 spid40   Closing file d:\mssql7\data\dbname_d1.mdf.

    So apparently, it's still looking for that logfile, even though it's name is now "bad_dbname_L1.mdf"

    Same thing if plain "sp_attach_db" used.

    To alzdba:

    As for backups and such, the databases are backed up to disk every day at 3:00 AM, and the backup files are taken off to tape the next morning.  These are retained for two weeks.  Full schemas are kept at various locations so recreating the database is not a problem. 

    Our problem was not recovery of most of the data (all 10G of it) - it was a small subset from the database in question from about 3:00 PM EDT to about 7:00 PM EDT.  This data was never replicated to the various copies we have in other locations.  Recovery of all of the data except that in the timeframe above, was done by DTS magic pulling data from another location over a high-speed link and 2.5 hours later, we were back up again...

    We just need to get into these files from the corrupted db, so we can extract the data from the timeframe where data was lost.  Hope that clarifies things a bit.

    Cheers,

    -- Joe

  • I wonder if a tool like Lumigent Log Explorer would work?  I don't know if it works on a log file that is not attached to a db.

    Steve

  • Create new database with same name, stop SQL Server, replace the MDF file from original database file and restart SQL Server to see whether your database is back in suspect mode. If it does, reset the database to emergency mode and run DBCC Rebuild_Log to recreate the log file. Once database is back to normal, run DBCC checkdb to ensure there is no integrity error, otherwise, you have to BCP the data out.

    Undocumented DBCC Command REBUILD_LOG

    EXEC sp_configure 'allow updates', 1

    RECONFIGURE WITH OVERRIDE

    GO

    BEGIN TRAN

    UPDATE master..sysdatabases

    SET status = status | 32768

    WHERE name = 'MyDatabase'

    IF @@ROWCOUNT = 1

    BEGIN

    COMMIT TRAN

    RAISERROR('emergency mode set', 0, 1)

    END

    ELSE

    BEGIN

    ROLLBACK

    RAISERROR('unable to set emergency mode', 16, 1)

    END

    GO

    EXEC sp_configure 'allow updates', 0

    RECONFIGURE WITH OVERRIDE

    GO

    -- Restart SQL Server at this point.

    DBCC REBUILD_LOG('MyDatabase','C:\MyDatabase.ldf')

    /*Perform physical and logical integrity checks at this point.

    Bcp data out if your integrity checks demonstrate that problems exist.

    */

    ALTER DATABASE MyDatabase SET MULTI_USER

    GO

    -- Set database options and recovery model as desired.

    GO

  • Thanks for everything folks - got the data I needed late last night, and it's been safely merged back into a new production db.

    Just a quick reflection:  I had a hard time selling SQL Server to the Unix guys here, as they thought the DB they were using under the Open Source license was absolutely fantastic -- until it started slowing down to a crawl.  A failure last November (and a 41 day recovery period!) allowed me to slip this in and get it through.

    With this failure - a weird one at that - and only a little over a day to recover, we were able to continue production while recovering, and the amount of data lost was miniscule but fully recoverable.  It goes to show that not only is SQL Server a good DB to use and better than some of the other DBs out there - but the adage "you gets what you pays for" definately applies here

    Cheers to all - and thanks!

    -- Joe

  • You say you got it to work. Great news.

    However, what enabled you to solve the issue? Was it one of our suggestions? Or one of your own?

    Post the solution here or say which solution helped you. That way someone else who may have the same issue and finds this thread will know what solved it for you.

    Thanks,

    -SQLBill

  • I had the same error and the suggestions posted by Allen Cui worked - putting the database into emergency mode, then rebuilding the log, etc.

     

    Cheers

    Jeff

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply