There was a question earlier on one of the SQL forums as to whether or not Ghost Cleanup overwrote the deleted rows when it ran so as to ensure no one could read that data again.
Now I could just reference Paul Randal‘s blog post where he described Ghost Cleanup in depth and leave it there, but where would the fun be in that?
So, now that everyone’s read Paul’s blog post, time for some fun with undocumented commands to prove (again) what we already know (and what Paul’s already shown).
Setup code:
CREATE TABLE TestingCleanup ( ID INT IDENTITY PRIMARY KEY, Description VARCHAR(20), Filler CHAR(50) DEFAULT '' ); GO INSERT INTO TestingCleanup (Description) VALUES ('One'), ('Two'), ('Three'), ('Four'), ('Five'), ('Six'), ('Seven'), ('Eight'), ('Nine'), ('Ten') -- Find the page number that the table is on SELECT OBJECT_ID('TestingCleanup') -- 1399012065 DBCC IND(11,1399012065,1) -- The IAM is 1:309 and the data page is 1:308
Now I’m going to delete half the table and then go and look at the page. The ghost cleanup can’t run until the transaction commits, so running this within a transaction gives a chance to go and look at how things work.
BEGIN TRANSACTION DELETE FROM TestingCleanup WHERE ID%2 = 0 -- delete the even rows SELECT ID, Description FROM TestingCleanup -- 5 rows DBCC TRACEON (3604) DBCC PAGE(11,1,310,1) DBCC TRACEOFF (3604) COMMIT TRANSACTION
The delete removes all the rows with even identity values and the select returns only 5 rows, as expected
Dump type 1 for DBCC Page gives the header and then each row separately in binary. I’m editing out uninteresting parts of the output to keep things manageable.
Page header: m_pageId = (1:308) m_headerVersion = 1 m_type = 1 Metadata: ObjectId = 1399012065 m_prevPage = (0:0) m_nextPage = (0:0) m_ghostRecCnt = 5 m_tornBits = 0
Five ghosted records (m_ghostRecCnt = 5), corresponding to the 5 deleted rows. Now for the rows (I’m only going to show the first two)
Slot 0, Offset 0x60, Length 68, DumpStyle BYTE Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 68 Memory Dump @0x000000000FD4A060 0000000000000000: 30003a00 01000000 20202020 20202020 †0.:..... 0000000000000010: 20202020 20202020 20202020 20202020 † 0000000000000020: 20202020 20202020 20202020 20202020 † 0000000000000030: 20202020 20202020 20200300 00010044 † .....D 0000000000000040: 004f6e65 ††††††††††††††††††††††††††††.One Slot 1, Offset 0xa4, Length 68, DumpStyle BYTE Record Type = GHOST_DATA_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 68 Memory Dump @0x000000000FD4A0A4 0000000000000000: 3c003a00 02000000 20202020 20202020 †<.:..... 0000000000000010: 20202020 20202020 20202020 20202020 † 0000000000000020: 20202020 20202020 20202020 20202020 † 0000000000000030: 20202020 20202020 20200300 00010044 † .....D 0000000000000040: 0054776f ††††††††††††††††††††††††††††.Two
The first row, with an ID of 1 and Description of ‘One’, is a Primary Record. The second row, with an ID of 2 and Description of ‘Two’ is a Ghost Data Record. Deleted but not removed from the rows on the page. The same goes for the rest of the rows, the ones that were deleted (even values) are Ghost Data Records, the other are Primary Records. (aside, that begs the question, if there are Primary Data Records, are there Secondary or Tertiary? If so, where?)
Then if we look at the bottom of the DBCC Page output there’s the slot array (the offset of the rows on the page) there’s only 5 entries in there, despite there still being 10 displayed in the record section of DBCC Page. That’s because 5 of the aren’t real rows any longer.
Now let’s exorcise those ghosts.
DBCC ForceGhostCleanup -- Undocumented. Do not use in production DBCC TRACEON (3604) DBCC PAGE(11,1,310,1) DBCC TRACEOFF (3604)
Now when we look at the header, the ghostRecCnt is 0, the ghosts are gone and DBCC Page only returns the 5 remaining rows thusly
</code>Slot 0, Offset 0x60, Length 68, DumpStyle BYTE Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 68 Memory Dump @0x000000000B8DA060 0000000000000000: 30003a00 01000000 20202020 20202020 †0.:..... 0000000000000010: 20202020 20202020 20202020 20202020 † 0000000000000020: 20202020 20202020 20202020 20202020 † 0000000000000030: 20202020 20202020 20200300 00010044 † .....D 0000000000000040: 004f6e65 ††††††††††††††††††††††††††††.One Slot 1, Offset 0xe8, Length 70, DumpStyle BYTE Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 70 Memory Dump @0x000000000B8DA0E8 0000000000000000: 30003a00 03000000 20202020 20202020 †0.:..... 0000000000000010: 20202020 20202020 20202020 20202020 † 0000000000000020: 20202020 20202020 20202020 20202020 † 0000000000000030: 20202020 20202020 20200300 00010046 † .....F 0000000000000040: 00546872 6565††††††††††††††††††††††††.Three
So Two’s gone. Or has is?
DBCC Page with dump type 2 dumps the entire page out without any interpretation of rows (the header is still interpreted, but there’s nothing new there hence I’m omitting it). So…
DBCC TRACEON (3604) DBCC PAGE(11,1,310,2) DBCC TRACEOFF (3604)
</code>DATA: Memory Dump @0x000000000D6EA000 <snip> 000000000D6EA070: 20202020 20202020 20202020 20202020 † 000000000D6EA080: 20202020 20202020 20202020 20202020 † 000000000D6EA090: 20202020 20202020 20200300 00010044 † .....D 000000000D6EA0A0: 004f6e65 3c003a00 02000000 20202020 †.One<.:..... 000000000D6EA0B0: 20202020 20202020 20202020 20202020 † 000000000D6EA0C0: 20202020 20202020 20202020 20202020 † 000000000D6EA0D0: 20202020 20202020 20202020 20200300 † .. 000000000D6EA0E0: 00010044 0054776f 30003a00 03000000 †...D.Two0.:..... 000000000D6EA0F0: 20202020 20202020 20202020 20202020 † 000000000D6EA100: 20202020 20202020 20202020 20202020 † 000000000D6EA110: 20202020 20202020 20202020 20202020 † 000000000D6EA120: 20200300 00010046 00546872 65653c00 † .....F.Three<. <snip>
The even-numbered data is still there. The rows are deleted, the page has only 5 rows on it and it requires either a raw binary dump of the page or some work with a hex editor to see them, but the data that was stored in the now deleted rows is still there and will be there until its overwritten by new rows from somewhere.
Does Ghost Cleanup overwrite old data to prevent people reading it later? No, most certainly not. Now it’s not something any user can do. Reading the data file requires permissions to either stop SQL, detach the DB or take the DB offline (or a hex editor that ignores file locks) and, unless a hex editor is standard on servers, permission to install software (or a hex editor that doesn’t require installation) and DBCC Page requires sysadmin rights. It’s still something to bear in mind if you’re ever working with confidential or classified data that has to be irretrievable after being deleted. It’s far from trivial to ensure that in SQL, especially if working with variable-length columns (where an update might not overwrite the old data if a page split/forwarded record results).