Need T-SQL to list all tables in database that 'qualify' for Compression

  • Does anyone know of any existing script that would allow me to list all tables in database that are good candidates for Compression (row or page) and that do not have contraindications for compression ?

    I used a script like this to identify the columnstore indexing candidate tables (that would benefit the most from columnstore idx) , and wonder if something like that exists for compression-candidatetables. I did not find one on www.

    Likes to play Chess

  • I used a script like this ...

    Like what?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Please read the following article.  They have code to calculate "S" and "U" values that you could programmatically interpret.  Combine that with the compression estimator code (documented in Books Online and super important to such a process especially if you have a shedload of LOBs and especially if those are "in-row") and you should be able to create your own.

    https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd894051(v=sql.100)

    Do be advised that the "S" and "U" related code is based on the sys.dm_db_index_operational_stats DMV.  The reason that's important is because of the warning about it in the same article I provided the link for under the heading of "Application Workload".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.  It would be gentlemanly if you posted either the script or the link where you got the script that you claim helped you to identify columnstore indexing candidate tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • THANKS!

    Question: can there be any reason that an error will be thrown while attempting to compress the table? Like what should be 'wrong' with table to cause the compression to fail and abort ?

    Likes to play Chess

  • VoldemarG wrote:

    THANKS!

    Question: can there be any reason that an error will be thrown while attempting to compress the table? Like what should be 'wrong' with table to cause the compression to fail and abort ?

    I guess it depends on whether or not you post the script or the link to the script that you used to identify column store candidates. 😉  This IS a two way street.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sure, sure, sorry. the script attached.

    beautiful thing.

    Criteria for Columnstore candidate table selection is listed in the comment.

    Likes to play Chess

  • -- NOTE: (Voldemar tested this out well, is very satisfied with this script, and even has it on daily baseline collector routine.  )

    -------------------------------------------------------

    -- The queries below need to be executed per database.

    -- Also, please make sure that your workload has run for

    -- couple of days or its full cycle including ETL etc

    -- to capture the relevant operational stats

    -------------------------------------------------------

    -- picking the tables that qualify CCI

    -- ===================================

    -- Key logic is

    -- (a) Table does not have CCI

    -- (b) At least one partition has > 1 million rows and does not have unsupported types for CCI

    -- (c) Range queries account for > 50% of all operations

    -- (d) DML Update/Delete operations < 10% of all operations

    select table_id, table_name

    from (select quotename(object_schema_name(dmv_ops_stats.object_id)) + N'.' + quotename(object_name (dmv_ops_stats.object_id)) as table_name,

    dmv_ops_stats.object_id as table_id,

    SUM (leaf_delete_count + leaf_ghost_count + leaf_update_count) as total_DelUpd_count,

    SUM (leaf_delete_count + leaf_update_count + leaf_insert_count + leaf_ghost_count) as total_DML_count,

    SUM (range_scan_count + singleton_lookup_count) as total_query_count,

    SUM (range_scan_count) as range_scan_count

    from sys.dm_db_index_operational_stats (db_id(),

    null,

    null, null) as dmv_ops_stats

    where (index_id = 0 or index_id = 1)

    AND dmv_ops_stats.object_id in (select distinct object_id

    from sys.partitions p

    where data_compression <= 2 and (index_id = 0 or index_id = 1)

    AND rows >= 1048576

    AND object_id in (select distinct object_id

    from sys.partitions p, sysobjects o

    where o.type = 'u' and p.object_id = o.id))

    AND dmv_ops_stats.object_id not in ( select distinct object_id

    from sys.columns

    where user_type_id IN (34, 35, 241)

    OR ((user_type_id = 165 OR user_type_id = 167) and max_length = -1))

    group by dmv_ops_stats.object_id

    ) summary_table

    where ((total_DelUpd_count * 100.0/(total_DML_count + 1) < 10.0))

    AND ((range_scan_count * 100.0/(total_query_count + 1) > 50.0))

    Likes to play Chess

  • VoldemarG wrote:

    THANKS!

    Question: can there be any reason that an error will be thrown while attempting to compress the table? Like what should be 'wrong' with table to cause the compression to fail and abort ?

    Thanks for the script.

    To answer the question above, the answer is "Yes and No".  I've not yet seen where compression will throw an error unless you run out of disk space during the rebuild of an index to compress it.  Also realize that such compression IS fully logged if you are in the FULL RECOVERY model so there could be quite the impact on the log file, just like any rebuild.  The article I provided a link to explains that in one of the charts.

    Also realize most code is NOT a panacea especially since the code in the link to find compression candidate tables is based on meta-data cache that (and it says so in the article) is "likely" to be present but isn't guaranteed to be present.

    You really need to study that article from sternum to sox instead of just looking it in the face. 😀  That also includes reading the articles that it links to.  There are both positive and negative aspects to this.  For example, you might end up being really surprised at how long it might take to rebuild a large compressed Clustered Index and how much room it will actually take.  That kind of info is in the article I provided the link to.

    You should also consider a couple of other things before you begin.  For example, do you have any In-Row blobs?  Those aren't compressible using row or page compression.  You might want to force such things to go "out-of-row" before you compress the table.  You CAN compress things like VARCHAR(MAX) using the COMPRESS function but I'd still move those buggers out of row, compressing them in the process if they occupy a lot of space.  Just remember that you cannot do a search on COMPRESSED lobs.

    Do you also intend to do a SHRINKFILE afterwards?  If so, then you REALLY need to read about the index rebuild stuff I just mentioned.

    Like I said, nothing is a panacea.  "It Depends" and you "Must look eye". If you're going to do this, you should probably spend the next solid week studying the documentation and doing some experiments on a test box.  Like my old friend Sergiy says, "A Developer must NOT guess... a Developer MUST KNOW!" 😀

    Heh... and please learn how to use the code block when posting code. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • VoldemarG wrote:

    THANKS!

    Question: can there be any reason that an error will be thrown while attempting to compress the table? Like what should be 'wrong' with table to cause the compression to fail and abort ?

    Ok... I just thought of something that can cause compression to fail.  If a row is really big, compression will still not allow it to be larger than 8,060 bytes.  Compression does add a bit of overhead and that overhead is included in the 8,060 bytes (I don't know how much overhead it is but it's extra).  So, if you have a really big row, it can cause the compression attempt to fail part way through when it finally reads and then tries to compress the row.  It only takes one such row for it to fail.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • oh well.

    Should I not even bother compressing non-clustered indexes?

    what if my table does not have clustered indexes? it only has a composite primary key, non-clustered. And a dozen of nonclustered indexed.  Should I  not consider such table for compression? Or it will still benefit from it?

    Likes to play Chess

  • VoldemarG wrote:

    oh well.

    Should I not even bother compressing non-clustered indexes?

    what if my table does not have clustered indexes? it only has a composite primary key, non-clustered. And a dozen of nonclustered indexed.  Should I  not consider such table for compression? Or it will still benefit from it?

    I'm not sure how you made such a leap to discouragement.  Study the article I posted the link for and, within that, study the sub-tending links it references.  Compression can be very good and it can be very bad.  "It Depends" on the table and the articles tell you on what it depends.  Restore a copy of a database to a development box and experiment.  Remember that nothing is a panacea and, at the same time, remember that the only way to really know is to conduct experiments even when you think you know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply