August 10, 2011 at 1:02 pm
Paul Randal (8/10/2011)
http://www.sqlskills.com/T_MCMVideos.asp
Ok, but where's the advanced stuff? :hehe:
Ok so take 3 months off, checked!
Brain surgury, checked!
August 11, 2011 at 11:22 am
Hey guys, I'm starting to think I've hit a bug in MS.
We have not completed the whole machine check up yet but we have not found any issues so far. I went ahead and restored another backup from our latest PROD db on the same server. I ran the same exact query exactly twice and I got the same error message.
I'm currently replicating the test on 2 more servers (perfect vm copies, different san, different physical host). I'd like to know what / how I need to go about this in case I need to post a bug on connect. What do I need to log, save etc.
August 11, 2011 at 11:27 am
In that case you should call Product Support to help you out.
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
August 11, 2011 at 11:48 am
Ok, I'll call PSS if we don't find anything wrong with the vm.
Any way I can test the assumption that my prod db logs are not corrupt to start with? CheckDB returns nothing but then again it was the case on the other db.
I'm thinking fill the logs to 100% and roll back the transaction. Is that a valid test?
August 11, 2011 at 3:26 pm
99% sure it's a bug in MS now.
I tried a lot of combos, new vms, new hardware, shrunk the log file, added a new log file, created a new blank db and swapped the log file.
All test failed exactly at the same time, the same way.
Next week I'll try to repro the bug in a new, smaller db with sharable data so you guys can play with it.
Have a great week-end.
August 11, 2011 at 3:36 pm
@@version?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 11, 2011 at 3:41 pm
Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86) Nov 24 2008 13:01:59 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
AND
Microsoft SQL Server 2005 - 9.00.3042.00 (Intel X86) Feb 9 2007 22:47:07 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
I wanted to try on 2008 & R2 too but I don't have dev or trial installed here.
August 11, 2011 at 3:43 pm
Patch to SP4 first and confirm that the behaviour's unchanged (CSS will almost certainly want you to do that)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 11, 2011 at 3:47 pm
GilaMonster (8/11/2011)
Patch to SP4 first and confirm that the behaviour's unchanged (CSS will almost certainly want you to do that)
Tx, I had forgotten about that one (1 call ever, 6 years ago 😛 .... been lucky on that end).
Am I better off going to the latest CU as well just to cover all bases, don't want to go through 2 approvals for the upgrade?
You guys are interested in playing with a test repro DB? Just let me know and I'll upload either here or on my ftp.
August 11, 2011 at 3:51 pm
GilaMonster (8/11/2011)
Patch to SP4 first and confirm that the behaviour's unchanged (CSS will almost certainly want you to do that)
Am I better off proving this is a bug in 2008 & R2 before calling this in? IIRC 2k5 is out of support unless you have extended support.
Do we have to pay for such a call (would have to approve the budget first)? Or do we need to pay upfront and then refund if this is really a bug and hence a free call?
August 18, 2011 at 11:52 am
Hey guys, finally got around to building a demo db that still fails the exact same way my "prod" db was failing.
This data is 100% obfuscated and only a 3 MB download.
Here are my findings. I've tested both on 2005 SP3 and 2008 R2 Express RTM.
On 2K5 it fails with this error with the DB sent into suspect mode :
Msg 3316, Level 21, State 2, Line 135
During undo of a logged operation in database 'Remi', an error occurred at log record ID (101944:25015:51). The row was not found. Restore the database from a full backup, or repair the database.
Msg 3314, Level 21, State 4, Line 135
During undoing of a logged operation in database 'Remi', an error occurred at log record ID (101944:25015:51). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
Msg 9001, Level 21, State 1, Line 135
The log for database 'Remi' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
Msg 3314, Level 21, State 5, Line 135
During undoing of a logged operation in database 'Remi', an error occurred at log record ID (101944:22501:1). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
On the prod version I had the same error, only 45 events instead of 2. I've pruned a lot of data and all the useless columns which is only normal that you get less linkage errors.
When running checkdb repair_allow_data_loss I get this :
Repair: The Clustered index successfully rebuilt for the object "dbo.RPT_Historique_Couts" in database "Remi".
Repair: The Nonclustered index successfully rebuilt for the object "dbo.RPT_Historique_Couts, PK_RPT_Historique_Couts_1" in database "Remi".
Msg 8945, Level 16, State 1, Line 1
Table error: Object ID 1935462815, index ID 1 will be rebuilt.
The error has been repaired.
Msg 2511, Level 16, State 1, Line 1
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 16 and 17.
The error has been repaired.
Msg 2511, Level 16, State 1, Line 1
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 35 and 36.
The error has been repaired.
Msg 8945, Level 16, State 1, Line 1
Table error: Object ID 1935462815, index ID 2 will be rebuilt.
The error has been repaired.
CHECKDB found 0 allocation errors and 2 consistency errors in table 'RPT_Historique_Couts' (object ID 1935462815).
CHECKDB fixed 0 allocation errors and 2 consistency errors in table 'RPT_Historique_Couts' (object ID 1935462815).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'Remi'.
CHECKDB fixed 0 allocation errors and 2 consistency errors in database 'Remi'.
The good news is that I don't have any dataloss (except all the time offline and previous PIT restore)
On 2K8 R2 I get a completely different error (actually much less useful). But the db still goes into suspect mode
Msg 208, Level 16, State 1, Line 3
Invalid object name 'dbo.RPT_Historique_Couts'.
Run this to see if it brings it back online :
ALTER DATABASE Remi SET EMERGENCY
ALTER DATABASE Remi SET SINGLE_USER
DBCC CHECKDB('Remi', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS
ALTER DATABASE Remi SET MULTI_USER
This time CHECKDB doesn't recover (says it doesn't). It looks like the code failed in a different way on 2K8 R2 :
Repair: The Clustered index successfully rebuilt for the object "dbo.RPT_Historique_Couts" in database "Remi".
Repair: The Nonclustered index successfully rebuilt for the object "dbo.RPT_Historique_Couts, PK_RPT_Historique_Couts_1" in database "Remi".
Msg 8945, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1 will be rebuilt.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 16 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 35 and 36.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 35 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 56 and 57.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 56 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 73 and 74.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 73 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 86 and 87.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 86 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 108 and 109.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 108 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 2511, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type In-row data). Keys out of order on page (3:528), slots 114 and 115.
The error has been repaired.
Msg 2570, Level 16, State 3, Line 3
Page (3:528), slot 114 in object ID 1935462815, index ID 1, partition ID 72057599794216960, alloc unit ID 72057600371261440 (type "In-row data"). Column "DateHistorique_fin" value is out of range for data type "datetime". Update column to a legal value.
The system cannot self repair this error.
Msg 8945, Level 16, State 1, Line 3
Table error: Object ID 1935462815, index ID 2 will be rebuilt.
The error has been repaired.
CHECKDB found 0 allocation errors and 13 consistency errors in table 'RPT_Historique_Couts' (object ID 1935462815).
CHECKDB fixed 0 allocation errors and 6 consistency errors in table 'RPT_Historique_Couts' (object ID 1935462815).
CHECKDB found 0 allocation errors and 13 consistency errors in database 'Remi'.
CHECKDB fixed 0 allocation errors and 6 consistency errors in database 'Remi'.
However if you check the rowcount, it's actually accurate.
Now the REAL fun part is that if you do COMMIT instead of rollback at the end of the script, the operation succeeds and no corruption is found! That in both versions.
Since I'm able to finish my work here my company has decided against opening a case with PSS. But I've been authorized to report this on connect.
I'd love to get as much feedback as possible for the versions I've not tested. I'm most interested in the latest SPs for 2K5+ all the way to Denali.
Please post your results along with the version(s) you tested on.
TIA.
Once I have it I'll open the connect and refference this thread.
August 18, 2011 at 12:03 pm
Cool - if you can email me the setup script I'll play with it this weekend! paul @ sqlskills.com
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
August 18, 2011 at 3:08 pm
When snapshot isolation is on, SQL Server makes a mess of the logging when changing the NULLability of the leading key column of a compound clustered index, causing error 3316 in XdesRMReadWrite::RollbackToLsn.
The following script includes notes and reproduces the issue without restoring a database.
USE master
GO
CREATE DATABASE Bang
GO
-- Has to be ON
ALTER DATABASE Bang SET ALLOW_SNAPSHOT_ISOLATION ON
GO
USE Bang
GO
CREATE TABLE dbo.Test
(
col1INTEGER NOT NULL,
col2INTEGER NOT NULL,
col3INTEGER NOT NULL
)
-- col2 is leading key of multi-column clustered index
-- no problem if index is not clustered
-- uniqueness not important
CREATE CLUSTERED INDEX cx ON dbo.Test (col2, col3, col1)
-- Must add a row
INSERT dbo.Test (col1, col2, col3) VALUES (0, 0, 0)
-- Txn required
BEGIN TRANSACTION
-- Add a column - default not important, can be NULL
ALTER TABLE dbo.Test ADD col4 INTEGER NOT NULL DEFAULT 0
GO
-- Update the new column (can be NULL)
UPDATE dbo.Test SET col4 = 0
-- Redefine cx leading key to allow NULL (bug!)
ALTER TABLE dbo.Test
ALTER COLUMN col2 INTEGER NULL
/*
Side note: This statement would now fail (as it should):
ALTER TABLE dbo.Test
ALTER COLUMN col2 INTEGER NOT NULL
Msg 5074, Level 16, State 1, Line 1
The index 'cx' is dependent on column 'col2'.
Msg 4922, Level 16, State 9, Line 1
ALTER TABLE ALTER COLUMN col2 failed because one or more objects access this column.
*/
GO
-- Add a row with NULL in cx leading key column
INSERT dbo.Test
(col1, col2, col3)
VALUES
(0, NULL, 0)
-- Bang!
ROLLBACK
GO
-- Reconnect, then:
-- Clean up
USE master
ALTER DATABASE Bang SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Bang
Validated on SQL Server 2008 R2 build 10.50.2772 (latest available at time of writing). This issue is fixed in Denali CTP 3.
August 18, 2011 at 4:58 pm
Paul Randal (8/18/2011)
Cool - if you can email me the setup script I'll play with it this weekend! paul @ sqlskills.com
It's already in the zip file.
August 18, 2011 at 7:17 pm
Only 1 problem @SqlWiki, [DateHistorique_fin] is not the leading column, it's the 2nd of 3 columns being changed from NOT NULL to NULLable.
ALTER TABLE [dbo].[RPT_Historique_Couts] ADD CONSTRAINT [IX_Historique_Couts_NoArticle] UNIQUE CLUSTERED
(
[DateHistorique] ASC,
[DateHistorique_fin] ASC,
[NoArticle] ASC
)
I've also tried your demo and while it errors out on the same error, the DB doesn't go into suspect mode. I give it to you, it's pretty darn close, but it's not the same issue (unless your demo just needs more data to have the same output, I have ±1K pages in my table that gets corrupted).
Do you have a KB link that talks about this?
Downloading denali atm to try my code on it. Maybe they fixed it there too!
P.S. Where do you Work Paul, I see your name associated with SqlCat. With all the hidden knowledge you seem to have it wouldn't surprise me the least if that were true :-D.
Viewing 15 posts - 31 through 45 (of 78 total)
You must be logged in to reply to this topic. Login to reply