When is old data ''cleaned'' from pages?

  • According to Inside SQL Server 2000 by Kalen Delaney, if you delete a record from a heap, it saves the slot until compaction occurs.  If it saves this space, is the data 'deleted' or zeroed or is it just marked as not there anymore? 

    What about on page splits during an update?  Is the old data actually deleted?  Does compaction actually zero all the space that is not used by current records?

    How might text data be handled differently?

  • As far as I understand it, this depends on how you delete the data. If you use DELETE, the data is actually physically deleted. If you use, say, TRUNCATE, the whole pages are marked as "not to be there anymore".

    Maybe you can find some additional information here:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/thestorageengine.asp?frame=true%5d

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;230785

    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx

    http://www.microsoft.com/resources/documentation/sql/2000/all/reskit/en-us/part3/c1161.mspx 

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank  you're an absolute wonder with your references - do you have a sql database to store them ??????

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Sure, at the moment 1206 links related to SQL Server to be found here:

    http://www.insidesql.de/component/option,com_bookmarks/Itemid,42/

    Almost all are in english

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • The only thing I couldn't find in the references supplied by Frank is their official name. The are called Ghost records and if you ever look at the TL lo with DBCC LOG you can see where it is marked as a ghost record.

  • I haven't finished digesting all the new information but am certain I will find more information than I know now.  Thank you for the links.

    The ghost records were what compaction was supposed to get rid off.  I am assuming that since SQL initializes new datafile space when you add it to a database, then it would have to initialize (write over) any space when it compacted and made the space on a page available for reuse.

  • Finally it hit me where I have read this term "Ghost records" before.

    When you examine a data page via DBCC PAGE you also come across "m_ghostRecCnt = ...  " in the header and then

    Slot 3 Offset 0x99

    ------------------

    Record Type = GHOST_DATA_RECORD                    

     

    somewhere below

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • This ghost record is the 'slot' I mentioned.  Compaction is suposed to clean up the ghost records.  When Compaction occurs, and these 'ghost' records are cleaned up, does SQL rewrite over the ghosted records to insure the data is wiped from the disk?  What would compaction do with the text fields when a record that has text fields is ghosted.  Or even, let's say you had a text field that had 3,500 character in it and you updated the field to something with 100 characters long, would it write over the extra space or leave it on disk?  What if you rewrote a text field longer than it was, is there anyway it would move the text around and leave the old data on the disk?

    I know I'm being awful particular in my questions, but I have a real reason for wanting to know exactly how where old data may be left when values change or are deleted in the deleted in the database.

    And again, I appreciate the links and some of the things I want to know might be made more clear after reviewing all the material you have pointed me to - maybe especially how text is handled.

  • You can dig back thru my history as there was a similar conversation and I posted the results of several test I did.

    Basically thou there are various factors.

    Text type columns, when you write data to them that data remains beyond your new next until something of equal length or greater overwrites it.

    So if say you typed (note: I have kids)

    The sly brown fox went into the chicken coup.

    And then you change to

    A little girl was happy.

    The page retains this way.

    A little girl was happy.nto the chicken coup.

    But there is a marker in the data to tell the SQL Engine to stop reading beyond a point.

    Now if the data in the text field spanned 2 or more pages and you entered a replacement that doesn't utilize all the pages then the pages are deallocated until something else requests a fresh page. The data itself still remains as it was.

    Now for in row data (char, numeric, float, varchar, etc).'

    When the data is changed it will actually mark the existing record as a ghost and allocate a new spot on the page for the data to store.

    In fact even if you have a clustered index on a table the data on the page may not be sorted the same. On each page there is a poitner to each record in sort position and in each record there are enough bits of info to tell the length.

    So say you have the lastname Jones and another Travis and this is a clustered index the sort is to SQL

    Jones

    Travis

    but on the page it may read

    Slot 85

    Jones

    Slot 12

    Travis

    I haven't checked yet what happens on a full page or at the point all slots have been used will it overwrite a previous ghost but will get around to that.

    Now for compaction. The data is pushed upward (again not neccessarily sorted on the page) until all free gaps (ghosts) are overwritten and the pages are as full as defined by the index or 100%.

    The remaining pages in this case that were in use still have the data written on the but again they are deallocated, so you may leave a lot of bits behind.

    If you truncate the free space then you may clear out the majority of the ghost but there is no guarantee it will get all.

    Hope that helps.

  • Yes, that help tremendously with the text. 

    My guess with what I know is that if your page is full but you have ghost records also(ie logically free space) and then you need to insert a new record, a page split ocurs.

     

  • As I said. I haven't tested but will try when I get a PC at home again (was robbed last week so I am without).

  • Sorry to hear that! Hope you have a good insurance for that!

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply