May 15, 2011 at 8:14 am
Comments posted to this topic are about the item Page Compression
May 15, 2011 at 8:14 am
Very good question on internals
M&M
May 15, 2011 at 11:36 pm
Nice question, thanks.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
May 16, 2011 at 1:41 am
This was removed by the editor as SPAM
May 16, 2011 at 3:15 am
Nice, and quite straightforward, question for a Monday morning. Thank you.
_____________________________________________________________________
[font="Comic Sans MS"]"The difficult tasks we do immediately, the impossible takes a little longer"[/font]
May 16, 2011 at 4:19 am
Good question today.
May 16, 2011 at 8:37 am
Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:
" If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."
How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...
But to be fair, it started with:
"The details of data compression are subject to change without notice in service packs or subsequent releases."
It's a moving target apparently, so what is true today may not be tomorrow.
Peter Trast
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems
May 16, 2011 at 9:12 am
Peter Trast (5/16/2011)
Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:" If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."
How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...
But to be fair, it started with:
"The details of data compression are subject to change without notice in service packs or subsequent releases."
It's a moving target apparently, so what is true today may not be tomorrow.
Hi Peter,
Alot of this is determined by the internal storage of Data on a Page.
Column Prefix & Page Dictionary Compression both operate at a Binary level.
For both Column Prefix & Page Dictionary the Header of a Page actually gains additional MetaData call CI or Compression Information.
CI has two components an Anchor Tag for Column Prefix Compression, and a Dictionary Array for Page Dictionary Compression.
The record that contains the most common and repeatable pattern per column is added to the Anchor Tag, and a pointer to that record is added.
For Example.
If my data is
Row 1 Row 2 Row 3
BRAD200437.56Text Comments made by This User
BRADLEY200546.97Additional Comments made by This User
BRADFORD197737.99Text2004
For Column Prefix to be applied we would take the data in Row 1. We would observe that the common binary pattern (and we didn’t use binary but this is just an example) is BRAD. SQL will take the field with the most bytes of data (or it will just choose the first record it comes to if there is a tie), and move it up to the Anchor Record.
CI
ANCHORBRADFORD197737.99Text Comments made by This User
DICTIONARY
ROW 1ROW2ROW3
[4]2004[3].56<null>
[4]LEY2005[0]46.97[0] Additional Comments made by This User
<null><null>[4]2004
When The Page Dictionary Compression is applied, It will find common binary patterns across the entire page and place them in the Dictionary portion of CI in Multi-Dimensional Array Values. For this example we only have one row in the array.
CI
ANCHORBRADFORD197737.99Text Comments made by This User
DICTIONARY0
ROW 1ROW2ROW3
0[3].56<null>
[4]LEY2005[0]46.97[0] Additional Comments made by This User
<null><null>0
So if you have a page where you cannot get any savings from Page Column Prefix or Page Dictionary Compression, then your records will be row compressed and stop there.
It really is determined by what the data is on the page, and if you can get compression savings. SQL tries to keep you from having the additional overhead of Page Compression if there is no benefit to it.
Twitter: @SQLBalls
Blog: http://www.SQLBalls.com
Channel: https://www.youtube.com/@Tales-from-the-Field
May 16, 2011 at 9:15 am
And Great Question Steve!
Twitter: @SQLBalls
Blog: http://www.SQLBalls.com
Channel: https://www.youtube.com/@Tales-from-the-Field
May 16, 2011 at 9:24 am
Bradley B (5/16/2011)
Peter Trast (5/16/2011)
Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:" If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."
How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...
But to be fair, it started with:
"The details of data compression are subject to change without notice in service packs or subsequent releases."
It's a moving target apparently, so what is true today may not be tomorrow.
Hi Peter,
Alot of this is determined by the internal storage of Data on a Page.
Column Prefix & Page Dictionary Compression both operate at a Binary level.
For both Column Prefix & Page Dictionary the Header of a Page actually gains additional MetaData call CI or Compression Information.
CI has two components an Anchor Tag for Column Prefix Compression, and a Dictionary Array for Page Dictionary Compression.
The record that contains the most common and repeatable pattern per column is added to the Anchor Tag, and a pointer to that record is added.
For Example.
If my data is
Row 1 Row 2 Row 3
BRAD200437.56Text Comments made by This User
BRADLEY200546.97Additional Comments made by This User
BRADFORD197737.99Text2004
For Column Prefix to be applied we would take the data in Row 1. We would observe that the common binary pattern (and we didn’t use binary but this is just an example) is BRAD. SQL will take the field with the most bytes of data (or it will just choose the first record it comes to if there is a tie), and move it up to the Anchor Record.
CI
ANCHORBRADFORD197737.99Text Comments made by This User
DICTIONARY
ROW 1ROW2ROW3
[4]2004[3].56<null>
[4]LEY2005[0]46.97[0] Additional Comments made by This User
<null><null>[4]2004
When The Page Dictionary Compression is applied, It will find common binary patterns across the entire page and place them in the Dictionary portion of CI in Multi-Dimensional Array Values. For this example we only have one row in the array.
CI
ANCHORBRADFORD197737.99Text Comments made by This User
DICTIONARY0
ROW 1ROW2ROW3
0[3].56<null>
[4]LEY2005[0]46.97[0] Additional Comments made by This User
<null><null>0
So if you have a page where you cannot get any savings from Page Column Prefix or Page Dictionary Compression, then your records will be row compressed and stop there.
It really is determined by what the data is on the page, and if you can get compression savings. SQL tries to keep you from having the additional overhead of Page Compression if there is no benefit to it.
The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉
Peter Trast
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems
May 16, 2011 at 10:07 am
I hear you, there isn't really a published algorithm.
Because of that it has the chance of changing in SP's, CU's, or versions. You know how it is. But you can use DMV's & Extended Events to track the Compression changes to a Database.
It is a bit misleading though, because it lables pages as compression attempt failed, but Row compression didn't fail, Page Compression (Column Prefix & Page Dictionary) did.
I'm working on getting some numbers based of a couple tests I'm doing for a Deep Dive Presentation on Compression.
It could always change, but if I can get a baseline I'll have something to compare it against.
Twitter: @SQLBalls
Blog: http://www.SQLBalls.com
Channel: https://www.youtube.com/@Tales-from-the-Field
May 16, 2011 at 10:10 am
Excellent question today.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 16, 2011 at 10:57 am
Nice easy Monday question. 😎
May 16, 2011 at 10:58 am
Peter Trast (5/16/2011)
The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉
The threshold used to be that (a) only full pages are considered, and (b) the compressed page should be able to store 5 more rows, or 25% more rows, whichever is greater, compared with the original page. They might have changed, however (that's the way with undocumented magic numbers).
The rules are generally not very sophisticated because sophistication costs. The query optimizer for example) has lots of very simple transition levels, and very little genuine magic 🙂
May 16, 2011 at 1:55 pm
SQLkiwi (5/16/2011)
Peter Trast (5/16/2011)
The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉The threshold used to be that (a) only full pages are considered, and (b) the compressed page should be able to store 5 more rows, or 25% more rows, whichever is greater, compared with the original page. They might have changed, however (that's the way with undocumented magic numbers).
The rules are generally not very sophisticated because sophistication costs. The query optimizer for example) has lots of very simple transition levels, and very little genuine magic 🙂
I like the fact that you say "very little" when speaking of magic. I KNEW there had to be SOME 😉
Peter Trast
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply