Compress and Decompress in SQL Server 2016

  • Comments posted to this topic are about the item Compress and Decompress in SQL Server 2016

  • Nice article but testing of this function is a must. When looking into these commands a few months ago, although the general trend was a reduction in data length I did have some data that actually increased in size when fed through COMPRESS().

  • BrainDonor - Monday, October 1, 2018 3:11 AM

    Nice article but testing of this function is a must. When looking into these commands a few months ago, although the general trend was a reduction in data length I did have some data that actually increased in size when fed through COMPRESS().

    I guess if you compress an already compressed value it will make it bigger.

  • Jonathan AC Roberts - Monday, October 1, 2018 4:52 AM

    I guess if you compress an already compressed value it will make it bigger.

    That is one way but this was just imported text - Compress and Decompress Basic Examples.

  • Interesting article. 
    As a side note, if I remember correctly, MSSQL will 'ignore' out-of-page data (LOB_DATA) when it comes to PAGE compression; I guess COMPRESS() and DECOMPRESS() are the 'workarounds' for this? I do hope they'll add compression to those sooner than later though.
    Anyway, as such I think it would still be interesting to see a figure 3-3 that shows the effect of table-compression on the test-data.

  • quite interesting.

    Like Deroby, I'd like to see comparison between COMPRESS() and compression at the table level

  • deroby - Monday, October 1, 2018 6:07 AM

    As a side note, if I remember correctly, MSSQL will 'ignore' out-of-page data (LOB_DATA) when it comes to PAGE compression; I guess COMPRESS() and DECOMPRESS() are the 'workarounds' for this? I do hope they'll add compression to those sooner than later though.
    Anyway, as such I think it would still be interesting to see a figure 3-3 that shows the effect of table-compression on the test-data.

    My testing shows that:

    1. both ROW and PAGE compression definitely do not work with off-row data (LOB and Row-Overflow)
    2. ROW compression does not work at all with MAX types, even if the data is stored in-row
    3. PAGE compression can work with in-row values of MAX types, but not over a certain size (not sure why, though)

    You can try for yourself using the queries below. In the second set, even though the value is only 6000 bytes, and hence in-row, PAGE compression does nothing really. But, truncate the table, change the replicate amount to be 4000, and then PAGE compression works wonders.

    USE [tempdb];

    -- DROP TABLE dbo.CompressionTest;
    CREATE TABLE dbo.CompressionTest
    (
    [ID] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    [SomeValue] VARCHAR(MAX) NOT NULL
    );

    INSERT INTO dbo.CompressionTest ([SomeValue])
    SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'a'), 9000) -- off-row value
    FROM master.sys.all_columns;

    ALTER TABLE dbo.CompressionTest REBUILD WITH (FILLFACTOR = 100, DATA_COMPRESSION = NONE);

    EXEC sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 97376 KB  97024 KB  72 KB   280 KB
    */

    ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = ROW);

    EXEC sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 97120 KB  96960 KB  72 KB   88 KB
    */

    ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = PAGE);

    EXEC tempdb.sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 97120 KB  96960 KB  72 KB   88 KB
    */

    -------------------------

    TRUNCATE TABLE dbo.CompressionTest;

    INSERT INTO dbo.CompressionTest ([SomeValue])
    SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'a'), 6000) -- in-row value
    FROM master.sys.all_columns;

    ALTER TABLE dbo.CompressionTest REBUILD WITH (FILLFACTOR = 100, DATA_COMPRESSION = NONE);

    EXEC sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 85128 KB  84328 KB  232 KB   568 KB
    */

    ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = ROW);

    EXEC sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 85128 KB  84328 KB  232 KB   568 KB
    */

    ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = PAGE);

    EXEC sys.sp_spaceused N'dbo.CompressionTest';
    /*
    name      rows  reserved  data   index_size unused
    dbo.CompressionTest 10539 85128 KB  84328 KB  232 KB   568 KB
    */

    SELECT *
    FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'dbo.CompressionTest'), 1, 1, 'DETAILED');

    Converting the table to a Clustered Columnstore Index sometimes helps with large types, even off row values.

    So yes, I think COMPRESS / DECOMPRESS is definitely an alternative for this scenario (that or Clustered Columnstore, depending on the scenario).

    On a related note: I brought up this very point (these two work-arounds) in my post about the new UTF-8 Collations in SQL Server 2019:
    Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?

    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • BrainDonor - Monday, October 1, 2018 5:00 AM

    Jonathan AC Roberts - Monday, October 1, 2018 4:52 AM

    I guess if you compress an already compressed value it will make it bigger.

    That is one way but this was just imported text - Compress and Decompress Basic Examples.

    There is a certain minimum size of the resulting "compressed" value (I guess due to "overhead"), so input values under a certain size (depending on compression algorithm, I believe) will actually have negative gains from the compression.

    Also, compressing relies upon reducing patterns. If the input value has no discernible patterns (such as a value that has already been compressed), it won't compress plus you still get that overhead.

    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • compress definitely works a lot better for particular data - As an example one of my tables (700Million rows, 900GB + on standard format and page compression), when I compress the main blob on it , with a mix of text data and pdf/images it goes down to 400GB. not more because of PDF data which is already a zip format.
    This is using a gzipstream - which includes a header of 128 bytes. If it was deflatestream it would not include this header.

  • @Solomon Rutzky:
    thanks for working this out, my tests had shown similar results; I simply wondered what it would be like in this particular tests as it "fits" the subject.
    (
    and I was slightly hoping someone would say that version xyz now includes LOB_DATA when applying PAGE compression =)

    Kind regards,
    Roby

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply