tempdb going inconsistent after 24hrs

Question

Post reply

tempdb going inconsistent after 24hrs

Pavle-192195

SSChasing Mays

Points: 652
More actions
July 29, 2011 at 10:12 am

#242233

SQLS: 2008 R2 Express (10.50.1600.1)
Hello,
I encountered a very strange problem one of my customer's servers: for some reason after some time (approx 20hrs) i start seeing inconsistency errors in tempdb in the log and the application that uses the server crashes. Restarting the server solves the problems for some time but then it comes back again.
Error:
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x13956d03; actual: 0xf59dd9a8). It occurred during a read of page (1:2928) in database ID 2 at offset 0x000000016e0000 in file 'c:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\tempdb.mdf'. ...
When I try running checkdb on tempdb it detects no errors but I get:
DBCC CHECKDB will not check SQL Server catalog or Service Broker consistency because a database snapshot could not be created or because WITH TABLOCK was specified.
Any ideas as to what could be the reason for this ? Any hints as to where I could look ? I will check the system log because I/O errors could be due to hardware but that doesn't explain why a restart solves the problem and why it comes back.
I saw the DBCC message in other posts but their problem was across all system db's. For me running dbcc on model for example works fine so it looks DBCC screws up only on tempdb.
I also checked free space and I have plenty.
Thanks a lot, this looks very tricky, any help would help 🙂

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Certainly HD or less likely RAM issue.

My guess is that it takes a certain "high" load to get to the broken serctors on the drives.

So ya run your disgnostics.

Bad band aid >

This would be a bad hack but it could work (never tried).

Stop the server. Go to the DB's folder. Rename the tempdb files but leave them there. Restart the server (should create new files from scratch).

Hopefully you only have 1-2 bad sectors and that would keep them contained for now while you run diagnostics.

Pavle-192195 SSChasing Mays Points: 652 More actions · Answer 2

Thanks, I'll check the disk and see what the diagnosis says.

That's for the I/O error but what about the DBCC CHECKDB not creating the snapshot ? Also related to disk ???

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Pavle-192195 (8/1/2011)
Thanks, I'll check the disk and see what the diagnosis says.
That's for the I/O error but what about the DBCC CHECKDB not creating the snapshot ? Also related to disk ???

No checkdb "can't" be run on tempdb (errors out just like it did). There's more to it than that but I wouldn't worry about that if I were you.

Bipsy SSCrazy Points: 2207 More actions · Answer 4

You can troubleshoot it by modiying location of tempdb data and log files to another drive and then check if this problem still persist. If problem got fixed then try to check with SA to run checkdsk utilty for disk(Where tempdb files were existing).if this disk is found Ok then try to diagnose RAM as well.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

1) I would shut down sql server and other processes on the box and run chkdisk.exe (you can use -? to get options) to find possibly bad sectors on the disk. This can take a long time

2) Check to see if all drivers and firmware associated with your disk system and disk(s) are up to date. Those are by far the most likely causes of IO failures.

3) If above 2 things don't get you resolution. Time to replace some IO components!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

TheSQLGuru (8/2/2011)
1) I would shut down sql server and other processes on the box and run chkdisk.exe (you can use -? to get options) to find possibly bad sectors on the disk. This can take a long time
2) Check to see if all drivers and firmware associated with your disk system and disk(s) are up to date. Those are by far the most likely causes of IO failures.
3) If above 2 things don't get you resolution. Time to replace some IO components!

What do you think about my hack to keep the server running while doing most of the checks?

My feeling is that you can run checkdb on the dbs and then checkdisk for the rest. If you can move tempdb on another drive while doing that it should cover you 99.9% without any major downtime.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Ninja's_RGR'us (8/2/2011)
TheSQLGuru (8/2/2011)
1) I would shut down sql server and other processes on the box and run chkdisk.exe (you can use -? to get options) to find possibly bad sectors on the disk. This can take a long time
2) Check to see if all drivers and firmware associated with your disk system and disk(s) are up to date. Those are by far the most likely causes of IO failures.
3) If above 2 things don't get you resolution. Time to replace some IO components!
What do you think about my hack to keep the server running while doing most of the checks?
My feeling is that you can run checkdb on the dbs and then checkdisk for the rest. If you can move tempdb on another drive while doing that it should cover you 99.9% without any major downtime.

I am betting it won't matter. This is his boot drive so it is likely a poorly configured server, meaning 8MB default size, which grows. I believe he is getting disk corruptions because of volume of tempdb writes and it won't matter to lock out the blocks it currently has - more bad writes will occur. I could be wrong on this though.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

TheSQLGuru (8/3/2011)
Ninja's_RGR'us (8/2/2011)
TheSQLGuru (8/2/2011)
1) I would shut down sql server and other processes on the box and run chkdisk.exe (you can use -? to get options) to find possibly bad sectors on the disk. This can take a long time
2) Check to see if all drivers and firmware associated with your disk system and disk(s) are up to date. Those are by far the most likely causes of IO failures.
3) If above 2 things don't get you resolution. Time to replace some IO components!
What do you think about my hack to keep the server running while doing most of the checks?
My feeling is that you can run checkdb on the dbs and then checkdisk for the rest. If you can move tempdb on another drive while doing that it should cover you 99.9% without any major downtime.
I am betting it won't matter. This is his boot drive so it is likely a poorly configured server, meaning 8MB default size, which grows. I believe he is getting disk corruptions because of volume of tempdb writes and it won't matter to lock out the blocks it currently has - more bad writes will occur. I could be wrong on this though.

Agreed, I was just hoping for a 10 min bandaid on a deep cut untill the ambulance got there... 😉