Today we will have a quick change of pace. This will be less about maintenance and more about internals.
Today’s topic is one of particular interest to me as well. The topic is a small section of a presentation I like to give at User Groups and at SQL Saturdays.
Before we take this diversion from the under-carriage to something more related to the engine, let’s have a brief recap of the previous four days of pre-Christmas.
So far the gifts of pre-Christmas have included the following articles.
- Exercise for msdb – Day 4
- Backup, Job and Mail History Cleanup – 3rd Day
- Service Broker Out of Control – 2nd Day
- Maint Plan Logs – 1st Day
On the Fifth Day of pre-Christmas…
My DBA gave to me a much more compact database. Surprisingly it can be as much as much as 20% of the original size.
I know, I know. I’ve heard it before but this is not the compression used by doublespace and drivespace that we were given many years ago. This really is much better.
And yes I have heard about the performance factor too. That is a topic for another discussion. As an aside, when it comes to performance, I always tell people that they must test for themselves because mileage will vary.
No, what I want to talk about is much different. I want to talk about the CD Array(at the page level) and the new data types specific to compression that you may encounter within the CD Array.
CD Array Data Types
SQL Server introduces us to 13 data types that are used within the CD Array when Compression has been enabled. Twelve of these data types can be seen when Row Compression has been enabled on an object. The thirteenth data type is only applicable when page compression has been implemented.
There is no guarantee that any or all of these data types will be present on a page related to an object that has been compressed using either Row Compression or Page Compression. If you want to find these data types on a compressed page, you may have to do a little hunting.
To demonstrate that these data types exist and that they can be found, I have a sample script.
[codesyntax lang=”tsql”]
USE CompressTest GO IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'CDTypes') BEGIN DROP TABLE dbo.CDTypes END IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'CDTypes2') BEGIN DROP TABLE dbo.CDTypes2 END CREATE TABLE dbo.CDTypes( SomeNull int ,SomeBit BIT ,Some1Byte INT IDENTITY(1,1) PRIMARY KEY CLUSTERED ,Some2Byte CHAR(2) ,Some3Byte CHAR(3) ,Some4Byte CHAR(4) ,Some5Byte CHAR(5) ,Some6Byte CHAR(6) ,Some7Byte CHAR(7) ,Some8Byte CHAR(8) ,SomeLong CHAR(9) ,SomeBit2 BIT ) WITH (DATA_COMPRESSION=ROW) ; SELECT TOP 20000 SomeID = IDENTITY(INT,1,1), SomeInt = ABS(CHECKSUM(NEWID()))%50000+1, SomeLetters2 = REPLICATE(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65),10), SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY), SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME), SomeHex12 = RIGHT(NEWID(),12) INTO dbo.CDTypes2 FROM Master.dbo.SysColumns t1, Master.dbo.SysColumns t2; GO /* Let's create a Clustered Index */CREATE CLUSTERED INDEX CompressSTable ON dbo.CDTypes2(SomeID) WITH (DATA_COMPRESSION = PAGE); GO /* Insert Data */INSERT INTO dbo.CDTypes ( SomeNull, SomeBit, Some2Byte, Some3Byte,Some4byte, Some5byte , Some6byte, Some7byte, Some8byte, SomeLong, Somebit2) VALUES (NULL, 0, '12', '123', '1234', '12345', '123456', '1234567', '12345678', '123456789', 1) Go DBCC IND(CompressTest, 'CDTypes', 1) go /* Make sure CDTypes2 is page compressed */ALTER INDEX CompressSTable ON CDTypes2 REBUILD WITH (DATA_COMPRESSION = PAGE); DBCC IND(CompressTest, 'CDTypes2', 1) go DBCC TRACEON(3604) go /* Set Trace 3604 To Get Output to SSMS Then take a look at the Page --or-- Use the tableresults method Both methods shown here */DBCC PAGE('CompressTest', 1, 20392, 3) WITH tableresults go DBCC PAGE('CompressTest', 1, 24889, 3) --in this exercise the second page of type 1 is usually page compressed go
[/codesyntax]
Now let’s take a look at the different data types, starting with the 12 available with Row Compression.
0x00 | NULL | SomeNull |
0x01 | EMPTY | SomeBit |
0x02 | ONE_BYTE_SHORT | Some1Byte |
0x03 | TWO_BYTE_SHORT | Some2Byte |
0x04 | THREE_BYTE_SHORT | Some3Byte |
0x05 | FOUR_BYTE_SHORT | Some4Byte |
0x06 | FIVE_BYTE_SHORT | Some5Byte |
0x07 | SIX_BYTE_SHORT | Some6Byte |
0x08 | SEVEN_BYTE_SHORT | Some7Byte |
0x09 | EIGHT_BYTE_SHORT | Some8Byte |
0x0a | LONG | SomeLong |
0x0b | BIT_COLUMN | SomeBit2 |
If we look at page 20392 (yours will likely be different), we will find all of these data types present. We will also note that this page should show (COMPRESSED) PRIMARY_RECORD – which indicates that the page is Row Compressed. When hunting for these data types, it is a good idea to make sure the page is compressed first. In the supplied table, you can see what data type matches to which column in the table we created via the script. The table also provides a short description of what that data type represents (as you would see in the CD Array).
If we now want to explore and find the 13th data type, we need to look at the second table we created in the attached script – CDTypes2. Notice that this table has been page compressed. I even did that twice. I did this to make sure the data was page compressed (occasionally when testing I could not easily find a page compressed page unless I page compressed a second time).
Much the same as was done with Row Compression, we need to verify that a page is Page Compressed before searching up and down trying to find this 13th type. To find a Page Compressed page, we need to look in the CompressionInfo section of the page for CI_HAS_DICTIONARY. If this notation is present, the page is Page Compressed.
Here is a page snippet with the pertinent data type.
The CD array entry for Column 4 is what we are interested in here. In this particular record, we see the thirteenth data type of 0x0c (ONE_BYTE_PAGE_SYMBOL). This data type can appear for any of the columns in the CD array and can be different for each record on the page. This is due to how Page Compression works (which is also a different discussion).
For this last example, I want to backtrack a bit and point out that I used page 24889 to query for the Symbol data type in the CD Array. This page was different every time I re-ran the test script. One thing that was consistent with how I setup the exercise is that the page compressed page that had this data type was always the second page of type 1 (data page) in the results from DBCC Ind.
Conclusion
I hope this helped demystify compression in SQL Server – a little. This is only one piece of compression but an interesting one in my opinion. Use the provided script and play with it a little bit to get some familiarity.
Stay tuned for the 6th day of pre-Christmas.