September 30, 2018 at 10:11 pm
Comments posted to this topic are about the item Compress and Decompress in SQL Server 2016
October 1, 2018 at 3:11 am
October 1, 2018 at 4:52 am
BrainDonor - Monday, October 1, 2018 3:11 AMNice article but testing of this function is a must. When looking into these commands a few months ago, although the general trend was a reduction in data length I did have some data that actually increased in size when fed through COMPRESS().
I guess if you compress an already compressed value it will make it bigger.
October 1, 2018 at 5:00 am
Jonathan AC Roberts - Monday, October 1, 2018 4:52 AMI guess if you compress an already compressed value it will make it bigger.
That is one way but this was just imported text - Compress and Decompress Basic Examples.
October 1, 2018 at 6:07 am
Interesting article.
As a side note, if I remember correctly, MSSQL will 'ignore' out-of-page data (LOB_DATA) when it comes to PAGE compression; I guess COMPRESS() and DECOMPRESS() are the 'workarounds' for this? I do hope they'll add compression to those sooner than later though.
Anyway, as such I think it would still be interesting to see a figure 3-3 that shows the effect of table-compression on the test-data.
October 1, 2018 at 6:59 am
quite interesting.
Like Deroby, I'd like to see comparison between COMPRESS() and compression at the table level
October 1, 2018 at 8:42 am
deroby - Monday, October 1, 2018 6:07 AMAs a side note, if I remember correctly, MSSQL will 'ignore' out-of-page data (LOB_DATA) when it comes to PAGE compression; I guess COMPRESS() and DECOMPRESS() are the 'workarounds' for this? I do hope they'll add compression to those sooner than later though.
Anyway, as such I think it would still be interesting to see a figure 3-3 that shows the effect of table-compression on the test-data.
My testing shows that:
You can try for yourself using the queries below. In the second set, even though the value is only 6000 bytes, and hence in-row, PAGE compression does nothing really. But, truncate the table, change the replicate amount to be 4000, and then PAGE compression works wonders.
USE [tempdb];
-- DROP TABLE dbo.CompressionTest;
CREATE TABLE dbo.CompressionTest
(
[ID] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
[SomeValue] VARCHAR(MAX) NOT NULL
);
INSERT INTO dbo.CompressionTest ([SomeValue])
SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'a'), 9000) -- off-row value
FROM master.sys.all_columns;
ALTER TABLE dbo.CompressionTest REBUILD WITH (FILLFACTOR = 100, DATA_COMPRESSION = NONE);
EXEC sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 97376 KB 97024 KB 72 KB 280 KB
*/
ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = ROW);
EXEC sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 97120 KB 96960 KB 72 KB 88 KB
*/
ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = PAGE);
EXEC tempdb.sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 97120 KB 96960 KB 72 KB 88 KB
*/
-------------------------
TRUNCATE TABLE dbo.CompressionTest;
INSERT INTO dbo.CompressionTest ([SomeValue])
SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'a'), 6000) -- in-row value
FROM master.sys.all_columns;
ALTER TABLE dbo.CompressionTest REBUILD WITH (FILLFACTOR = 100, DATA_COMPRESSION = NONE);
EXEC sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 85128 KB 84328 KB 232 KB 568 KB
*/
ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = ROW);
EXEC sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 85128 KB 84328 KB 232 KB 568 KB
*/
ALTER TABLE dbo.CompressionTest REBUILD WITH (DATA_COMPRESSION = PAGE);
EXEC sys.sp_spaceused N'dbo.CompressionTest';
/*
name rows reserved data index_size unused
dbo.CompressionTest 10539 85128 KB 84328 KB 232 KB 568 KB
*/
SELECT *
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'dbo.CompressionTest'), 1, 1, 'DETAILED');
Converting the table to a Clustered Columnstore Index sometimes helps with large types, even off row values.
So yes, I think COMPRESS / DECOMPRESS is definitely an alternative for this scenario (that or Clustered Columnstore, depending on the scenario).
On a related note: I brought up this very point (these two work-arounds) in my post about the new UTF-8 Collations in SQL Server 2019:
Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?
Take care, Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
October 1, 2018 at 8:51 am
BrainDonor - Monday, October 1, 2018 5:00 AMJonathan AC Roberts - Monday, October 1, 2018 4:52 AMI guess if you compress an already compressed value it will make it bigger.That is one way but this was just imported text - Compress and Decompress Basic Examples.
There is a certain minimum size of the resulting "compressed" value (I guess due to "overhead"), so input values under a certain size (depending on compression algorithm, I believe) will actually have negative gains from the compression.
Also, compressing relies upon reducing patterns. If the input value has no discernible patterns (such as a value that has already been compressed), it won't compress plus you still get that overhead.
Take care, Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
October 1, 2018 at 11:29 am
compress definitely works a lot better for particular data - As an example one of my tables (700Million rows, 900GB + on standard format and page compression), when I compress the main blob on it , with a mix of text data and pdf/images it goes down to 400GB. not more because of PDF data which is already a zip format.
This is using a gzipstream - which includes a header of 128 bytes. If it was deflatestream it would not include this header.
October 1, 2018 at 2:29 pm
@Solomon Rutzky:
thanks for working this out, my tests had shown similar results; I simply wondered what it would be like in this particular tests as it "fits" the subject.
(and I was slightly hoping someone would say that version xyz now includes LOB_DATA when applying PAGE compression =)
Kind regards,
Roby
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply