Table Space Related Question

  • Hi,

    I have a table with 100 * 1000 rows now and this will grow by 10k every day. So i will be cleaning this table to have least amount of required data every month based on some date by deleting the records (not truncating) . so the table space will remain in the table because of deletion.  when doing some search on my table, will empty space cause slowness? 
    Also, how do i recover the deleted records space in my table. Any suggestions please

  • The space left behind by the deletes isn't really space left in the table. It depends on how the data in the table is stored. If it's a heap, this is just space on the disk within the database. There's no real need to "reclaim" this space. It'll just get reallocated by SQL Server. If it's a clustered index, it's space within the index. To get that space back, if you defragment your index, you'll eliminate it.

    The space itself isn't a performance problem... depending on your queries. If you have lots and lots of large range scans, then fragmented storage can be an issue. However, lots and lots of range scans is also an issue that you ought to be addressing. 

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Friday, January 13, 2017 11:09 AM

    The space left behind by the deletes isn't really space left in the table. It depends on how the data in the table is stored. If it's a heap, this is just space on the disk within the database. There's no real need to "reclaim" this space. It'll just get reallocated by SQL Server. If it's a clustered index, it's space within the index. To get that space back, if you defragment your index, you'll eliminate it.

    The space itself isn't a performance problem... depending on your queries. If you have lots and lots of large range scans, then fragmented storage can be an issue. However, lots and lots of range scans is also an issue that you ought to be addressing. 

    Thank you Grant and makes sense to me. Appreciated your response

  • If the deletes don't clear entire pages, then the pages will remain regardless of how sparsely populated they may be and, yes, that can hurt performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thank you for your time on this post and could you please elaborate your point. Below are more explanation from my side

    Current table records: 0,  assume i moved all the records to bakup table and truncated as one time operation. so  i have 0 records now in my table. from now on everyday  i will have 10k records in my table.so at the end of the month i will have 300k records and assume i will keep 10k records based on my need. so 290k records i will be deleting in every month(this is just rough numbers. Even numbers may be less then this). could you please take this as an example and share your points please

  • Just in case you need authoritative justification to do so for TF 1118, please see the following link...
    http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 13, 2017 2:57 PM

    Just in case you need authoritative justification to do so for TF 1118, please see the following link...
    http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

    Hi Jeff,

    thank you for the reply and honestly i couldn't understand from the link. If possible please give your opinion on my case

  • KGJ-Dev - Friday, January 13, 2017 8:28 PM

    Jeff Moden - Friday, January 13, 2017 2:57 PM

    Just in case you need authoritative justification to do so for TF 1118, please see the following link...
    http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

    Hi Jeff,

    thank you for the reply and honestly i couldn't understand from the link. If possible please give your opinion on my case

    I think that got cross-posted from another discussion.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • KGJ-Dev - Friday, January 13, 2017 1:11 PM

    Hi Jeff,

    Thank you for your time on this post and could you please elaborate your point. Below are more explanation from my side

    Current table records: 0,  assume i moved all the records to bakup table and truncated as one time operation. so  i have 0 records now in my table. from now on everyday  i will have 10k records in my table.so at the end of the month i will have 300k records and assume i will keep 10k records based on my need. so 290k records i will be deleting in every month(this is just rough numbers. Even numbers may be less then this). could you please take this as an example and share your points please

    What he means is that data in SQL Server is stored within an 8k page. Let's say all the rows on that page are deleted except for one, then that page is not reclaimed and it may not be reused for additional storage. If that happens a lot, then you'll start to see performance issues eventually because of all the pages that will have to be read through.

    However, again, if you use a clustered index for storage, and not a heap, then you have the opportunity to easily reclaim space by defragmenting the index. If you're going to throw away 290k worth of rows (not records) every month, then you probably will need to defragment your index. However, once you defragment the index, it will have dealt with the issue of partially filled pages, so it won't be anything to worry about.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Saturday, January 14, 2017 9:19 AM

    KGJ-Dev - Friday, January 13, 2017 8:28 PM

    Jeff Moden - Friday, January 13, 2017 2:57 PM

    Just in case you need authoritative justification to do so for TF 1118, please see the following link...
    http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

    Hi Jeff,

    thank you for the reply and honestly i couldn't understand from the link. If possible please give your opinion on my case

    I think that got cross-posted from another discussion.

    Thanks, Grant.  I wondered where that post went.  I thought the spaminator got it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • KGJ-Dev - Friday, January 13, 2017 10:37 AM

    Hi,

    I have a table with 100 * 1000 rows now and this will grow by 10k every day. So i will be cleaning this table to have least amount of required data every month based on some date by deleting the records (not truncating) . so the table space will remain in the table because of deletion.  when doing some search on my table, will empty space cause slowness? 
    Also, how do i recover the deleted records space in my table. Any suggestions please

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • KGJ-Dev - Friday, January 13, 2017 1:11 PM

    00000

    Hi Jeff,

    Thank you for your time on this post and could you please elaborate your point. Below are more explanation from my side

    Current table records: 0,  assume i moved all the records to bakup table and truncated as one time operation. so  i have 0 records now in my table. from now on everyday  i will have 10k records in my table.so at the end of the month i will have 300k records and assume i will keep 10k records based on my need. so 290k records i will be deleting in every month(this is just rough numbers. Even numbers may be less then this). could you please take this as an example and share your points please

    Grant Fritchey correctly expressed some of my concerns and remedies for this.  But, there's more on this.

    First, I'd make sure that the clustered index were a temporal one that uses the date/time column that you use to determine which "old" rows to delete.  I'll explain why in a minute.

    Second, the first of the month will be relatively a bitch for your log file and transaction log backups.  You're talking about deleting a month's worth of data at a time.  Although 290K rows isn't a huge thing (provided that the table isn't real wide or have huge blobs associated with it), I'll avoid such things any time I can especially during the end of the month/start of a month because everyone and their brother is using more resources trying to get end-of-month reporting and other cleanups done all at the same time.  And, to be sure, deleting rows is never done in a minimally logged fashion and, since you're deleting most of the table, it will almost certainly lock the table for the entire duration of the deletes.

    With the clustered index based on the date/time column that you use to determine what to delete based on age, you should have to rebuild the clustered index after the deletes because most of the pages will truly be emptied and contain 0 rows.  SQL Server will reuse those auto-magically.

    However, you still have the monthly overhead of doing the deletes.  I also question as to whether or not people will actually be done using the data you want to delete when you think.  So there's a couple of possible scenarios here.

    The first scenario is to not do the deletes all at once as you suggest,  Instead, setup a 40 day "sliding window".  By that, I mean consider setting up a job that only deletes rows that are older than 40 days and have it run every day.  That will do two things for you.  1) It'll spread the monthly overhead out over every day of the month making it totally non obnoxious and it will run faster, which means that the table will be unavailable for a lot less time, if at all.  Because it would be temporally based, you wouldn't ever need to rebuild the clustered index, just the non-clustered indexes.  This would also give people 10 days to do whatever month-end reporting they needed to do before you started dropping data from that previous month.  I would also allow you to expand certain reports so that they could always look at, say, the last 4 weeks instead of just "current month"

    The second scenario is the one you're talking about.  You decide that you still want to delete all but one day of data and that means that you're deleting about 29 days of data all in one shot.  That means that you're deleting about 96% of the table, which also has to delete 96% from any non-clustered indexes.  That's wicked expensive.  So, consider the following...

    Create two identical tables with identical indexes, etc.  For this discussion, let's call them "TableA" and "TableB".  We'll also call your original table "TheTable", which could be anything but we'll call it that for this discussion.

    First, "TheTable" will no longer physically exist.  Instead, it will be a SYNONYM that starts off pointing to "TableA".
    You'll also need a SYNONYM (let's call it "CopyFrom") that points to the empty "TableB"..

    1.  On "day 1, "TheTable" synonym will be pointing at "TableA" and "TableB" will be empty.
    2.  When you decide it's time to do the 290K row delete, don't.  Just repoint "TheTable" synonym to the currently empty "TableB".  Also, repoint the "CopyFrom" synonym to "TableA"   If someone is currently using the synonyms, they won't actually do the drop/create to rebuld it.  It'll patiently wait until no one is using it and then do the "flop".
    3  Then, simply copy the 10K rows of data from the "CopyFrom" synonym to "TheTable" synonym.  This will be MUCH faster and easier on the log file than doing 290K rows of deletes.
    4.  When the copy from Step 3 above is complete, truncate the table that the "CopyFrom" table is pointing at.  Note that you can't truncate a table through a synonym.  You can read which table the "CopyFrom" table is pointing at with code like the following...

      SELECT @SomeVariable = base_object_name
       FROM sys.synonyms
     
    WHERE name = 'CopyFrom'
    ;

    5.  Next month, just reverse the process.

    It does take a bit to set all this to setup but it's nasty fast and the table will be available at all times except for the millisecond that it takes to repoint the synonyms AND it won't interfere with sessions currently using the table through the synonym.

    :Shifting gears a bit, I normally store tables like this in a separate database using the SIMPLE Recovery Model so that I can also take advantage of any "Minimal Logging" that I can do.  It also reduces backups because I also wouldn't backup such expendable data.

    5.  Next month, just reverse the process.

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Grant,

    Thanks for your reply.

    Hi Jeff,

    Thank you for your detailed explanation and this will be much helpful to me. Appreciated your time on this.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply