January 24, 2011 at 9:08 am
Steve Jones - SSC Editor (1/24/2011)
The disk might receive "1234567" from SQL Server and for some reason, as it writes the bytes on disk, it writes 1234566". The page checksum could potentially catch this.
That is the one explination that I always read about this kind of coruption and it always puzzled me.
How would checksum be able to detect an error like this?
How would checksum be able to "See" that the SQL Engine meant 12345, but the bug in the disk driver actualy wrote a value of 54321 to a feild or row value stored in the file.
I have to always guess that what is actually being said is the SQL I/O thread wanted to write '101100' to the page file, but some disk write error caused '00000' to be written to disk.
Then the check sum value for data on disc for the page file would not match the checksum in the page file header?
January 24, 2011 at 9:37 am
You calculate a checksum across the page. When you read the page, you recalculate the checksum. If it doesn't match the value stored in the page header, than you have an issue.
January 24, 2011 at 10:28 am
Steve Jones - SSC Editor (1/24/2011)
You calculate a checksum across the page. When you read the page, you recalculate the checksum. If it doesn't match the value stored in the page header, than you have an issue.
Exactly. The checksum can't tell what the value should be, it's not that complex (it's not like parity in RAID), but it can tell you that the page has been modified after the checksum was calculated. If the checksum is recalculated it must match what's written in the page header.
As for what causes corruption - something malfunctioning in the IO subsystem. Bad driver, misbehaving IO filter driver, HBA/switch firmware, RAID controller/SAN controller, disk/SAN cache, disk hardware. Anything in that stack goes wrong and the page changes between when SQL issued it to the OS and when it gets the page back from the OS is considered corruption.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 24, 2011 at 10:43 am
paul s-306273 (1/24/2011)
I'd certainly go for Page Checksums, but do MS actually recommend that? Is it inferred that they are better because they also include torn page detection?
Yes. The recommendation is in BoL, http://msdn.microsoft.com/en-us/library/bb522682%28v=SQL.100%29.aspx which would perhaps have been a better reference than the one given in the explanation.
It's a tradeoff, of course. It's far better at detecting errors than Torn Page Detection, but there's a small but significant performance overhead (computing a checksum is a lot more expensive than checking some 2-bit fields).
Of course my view is that MS got this right and the performance overhead doesn't matter - if your data is valuable buy some more compute power to cover that overhead, and if it isn't valuable why do you have it at all in the first place? If it were me recommending data security, I would recommend using Page Checksum, full recovery model, log file on a RAID-10 array, data file on separate RAID 10 arrays (one array per filegroup), full backup daily, log backup every 10 minutes (backups to a machine other than the server using RAID-10 for backup storage), either transactional replication or log-shipping to a standby system, backups copied to movable media and taken off site twice daily. But it's extremely difficult to get the budget for all that.
edit: sometimes data is easily recoverable from source. In those cases you maybe don't worry about complex backup and recovery plans, and high availability strategy, and disaster recovery - but you still have to worry about data integrity, I think, so page checksum is still needed.
Tom
January 24, 2011 at 11:02 am
GilaMonster (1/24/2011)
Steve Jones - SSC Editor (1/24/2011)
You calculate a checksum across the page. When you read the page, you recalculate the checksum. If it doesn't match the value stored in the page header, than you have an issue.Exactly. The checksum can't tell what the value should be, it's not that complex (it's not like parity in RAID), but it can tell you that the page has been modified after the checksum was calculated. If the checksum is recalculated it must match what's written in the page header.
As for what causes corruption - something malfunctioning in the IO subsystem. Bad driver, misbehaving IO filter driver, HBA/switch firmware, RAID controller/SAN controller, disk/SAN cache, disk hardware. Anything in that stack goes wrong and the page changes between when SQL issued it to the OS and when it gets the page back from the OS is considered corruption.
Exaclty how I understood it to work also.
And a whole lot clearer when worded like that.
January 24, 2011 at 11:03 am
GilaMonster (1/24/2011)
Steve Jones - SSC Editor (1/24/2011)
You calculate a checksum across the page. When you read the page, you recalculate the checksum. If it doesn't match the value stored in the page header, than you have an issue.Exactly. The checksum can't tell what the value should be, it's not that complex (it's not like parity in RAID), but it can tell you that the page has been modified after the checksum was calculated. If the checksum is recalculated it must match what's written in the page header.
As for what causes corruption - something malfunctioning in the IO subsystem. Bad driver, misbehaving IO filter driver, HBA/switch firmware, RAID controller/SAN controller, disk/SAN cache, disk hardware. Anything in that stack goes wrong and the page changes between when SQL issued it to the OS and when it gets the page back from the OS is considered corruption.
Another potential issue is physical deterioration of the media. This does happen as discs get older. In the old days drive hardware (or in even older days, controller hardware) used to compute a faily complex checksum and write it the tail of each sector - sufficiently complex that the drive (or controller) could do some useful error correction - at least partly in order to handle such errors for long enough to let you make a good copy (so the drive or controller signalled when it had to do corrections - if you bothered to run the monitoring firmware - and that told you to make a copy quick and scrap the old disc). Of course this would also detect other most errors introduced between the point at which data was checksummed on the way in and the point at which the checksum was checked on the way out, but the last time I looked such other errors would be extremely rare.
Tom
January 24, 2011 at 11:03 am
Thanks for the question, and the discussion.
January 24, 2011 at 11:16 am
Tom.Thomson (1/24/2011)
Another potential issue is physical deterioration of the media.
Tom, this is a wonderfull point. I had forgotten about all the times restore from tape backup had caused issues only to find out that Torn Page detection was being used, or that there were extent errors because of Page Checksums not matching.
I have also experianced this after restoring a Rar file of the Database mdf and ldf files that where sent via FTP. Luckily rar was able to repair the archive file and we recovered the missing data.
January 24, 2011 at 12:51 pm
Good question
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 25, 2011 at 2:34 am
Tom.Thomson (1/24/2011)
Another potential issue is physical deterioration of the media. This does happen as discs get older. In the old days drive hardware (or in even older days, controller hardware) used to compute a faily complex checksum and write it the tail of each sector - sufficiently complex that the drive (or controller) could do some useful error correction - at least partly in order to handle such errors for long enough to let you make a good copy (so the drive or controller signalled when it had to do corrections - if you bothered to run the monitoring firmware - and that told you to make a copy quick and scrap the old disc).
I think these days the controller is even cleverer--it notices the failing sector and copies all its data onto a spare sector (it keeps a few of these available for this purpose), so you never notice the issue at all until SMART pops up and tells you there's a problem!
January 26, 2011 at 3:45 pm
paul.knibbs (1/25/2011)
Tom.Thomson (1/24/2011)
Another potential issue is physical deterioration of the media. This does happen as discs get older. In the old days drive hardware (or in even older days, controller hardware) used to compute a faily complex checksum and write it the tail of each sector - sufficiently complex that the drive (or controller) could do some useful error correction - at least partly in order to handle such errors for long enough to let you make a good copy (so the drive or controller signalled when it had to do corrections - if you bothered to run the monitoring firmware - and that told you to make a copy quick and scrap the old disc).
I think these days the controller is even cleverer--it notices the failing sector and copies all its data onto a spare sector (it keeps a few of these available for this purpose), so you never notice the issue at all until SMART pops up and tells you there's a problem!
There are actually two separate problems: writes that fail to write the correct data because the sector is faulty can switch to a spare sector; if the write was successful but the data becomes corrupt after the write and before it is read, the error checking code comes into play.
The drive (not the controller unless you have some rather old kit) will normally keep some sectors free on each cylinder for this purpose; primarily this copes with problems detected during the write phase (including detection by the drive's read-behind-write check, if it has one); this doesn't cope with problems introduced after the read-behind-write check - but if error correction is done when the corrupt data is read the drive can write the data to a spare sector and mark the original as redirected at that point; I recall some discs would, if error correction was successful, rewrite the correct data in the original position but would maintain a count of how many times it had done this, and if the count got high (exceded 3, I think) move the data spare sector. Of course if error correction fails, the data is lost (unless it's on a RAID with redundancy, ie not RAID 0).
As you say, you don't know about the errors until the monitoring software makes a report (and of course that time never comes if some clever sysadmin switched all hardware status reporting, including all warnings, off when he installed the system - the first thing you know about a disc problem is when it fails to maintain your data).
Tom
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply