Page Compression

  • Comments posted to this topic are about the item Page Compression

  • Very good question on internals

    M&M

  • Nice question, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • This was removed by the editor as SPAM

  • Nice, and quite straightforward, question for a Monday morning. Thank you.

    _____________________________________________________________________
    [font="Comic Sans MS"]"The difficult tasks we do immediately, the impossible takes a little longer"[/font]

  • Good question today.

  • Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:

    " If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."

    How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...

    But to be fair, it started with:

    "The details of data compression are subject to change without notice in service packs or subsequent releases."

    It's a moving target apparently, so what is true today may not be tomorrow.

    Peter Trast
    Microsoft Certified ...(insert many literal strings here)
    Microsoft Design Architect with Alexander Open Systems

  • Peter Trast (5/16/2011)


    Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:

    " If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."

    How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...

    But to be fair, it started with:

    "The details of data compression are subject to change without notice in service packs or subsequent releases."

    It's a moving target apparently, so what is true today may not be tomorrow.

    Hi Peter,

    Alot of this is determined by the internal storage of Data on a Page.

    Column Prefix & Page Dictionary Compression both operate at a Binary level.

    For both Column Prefix & Page Dictionary the Header of a Page actually gains additional MetaData call CI or Compression Information.

    CI has two components an Anchor Tag for Column Prefix Compression, and a Dictionary Array for Page Dictionary Compression.

    The record that contains the most common and repeatable pattern per column is added to the Anchor Tag, and a pointer to that record is added.

    For Example.

    If my data is

    Row 1 Row 2 Row 3

    BRAD200437.56Text Comments made by This User

    BRADLEY200546.97Additional Comments made by This User

    BRADFORD197737.99Text2004

    For Column Prefix to be applied we would take the data in Row 1. We would observe that the common binary pattern (and we didn’t use binary but this is just an example) is BRAD. SQL will take the field with the most bytes of data (or it will just choose the first record it comes to if there is a tie), and move it up to the Anchor Record.

    CI

    ANCHORBRADFORD197737.99Text Comments made by This User

    DICTIONARY

    ROW 1ROW2ROW3

    [4]2004[3].56<null>

    [4]LEY2005[0]46.97[0] Additional Comments made by This User

    <null><null>[4]2004

    When The Page Dictionary Compression is applied, It will find common binary patterns across the entire page and place them in the Dictionary portion of CI in Multi-Dimensional Array Values. For this example we only have one row in the array.

    CI

    ANCHORBRADFORD197737.99Text Comments made by This User

    DICTIONARY0

    ROW 1ROW2ROW3

    0[3].56<null>

    [4]LEY2005[0]46.97[0] Additional Comments made by This User

    <null><null>0

    So if you have a page where you cannot get any savings from Page Column Prefix or Page Dictionary Compression, then your records will be row compressed and stop there.

    It really is determined by what the data is on the page, and if you can get compression savings. SQL tries to keep you from having the additional overhead of Page Compression if there is no benefit to it.

  • And Great Question Steve!

  • Bradley B (5/16/2011)


    Peter Trast (5/16/2011)


    Great Question and a lucky guess so I read up on it. The strangest statement in the msdn article I read (http://msdn.microsoft.com/en-us/library/cc280464.aspx) was:

    " If the space gained by page compression minus the space that is required for the CI structure is not significant, page compression is not used for that page."

    How much is significant?? I would have thought that there would be some defined threshold (published) if it might NOT use compression even though it has been "turned on"...

    But to be fair, it started with:

    "The details of data compression are subject to change without notice in service packs or subsequent releases."

    It's a moving target apparently, so what is true today may not be tomorrow.

    Hi Peter,

    Alot of this is determined by the internal storage of Data on a Page.

    Column Prefix & Page Dictionary Compression both operate at a Binary level.

    For both Column Prefix & Page Dictionary the Header of a Page actually gains additional MetaData call CI or Compression Information.

    CI has two components an Anchor Tag for Column Prefix Compression, and a Dictionary Array for Page Dictionary Compression.

    The record that contains the most common and repeatable pattern per column is added to the Anchor Tag, and a pointer to that record is added.

    For Example.

    If my data is

    Row 1 Row 2 Row 3

    BRAD200437.56Text Comments made by This User

    BRADLEY200546.97Additional Comments made by This User

    BRADFORD197737.99Text2004

    For Column Prefix to be applied we would take the data in Row 1. We would observe that the common binary pattern (and we didn’t use binary but this is just an example) is BRAD. SQL will take the field with the most bytes of data (or it will just choose the first record it comes to if there is a tie), and move it up to the Anchor Record.

    CI

    ANCHORBRADFORD197737.99Text Comments made by This User

    DICTIONARY

    ROW 1ROW2ROW3

    [4]2004[3].56<null>

    [4]LEY2005[0]46.97[0] Additional Comments made by This User

    <null><null>[4]2004

    When The Page Dictionary Compression is applied, It will find common binary patterns across the entire page and place them in the Dictionary portion of CI in Multi-Dimensional Array Values. For this example we only have one row in the array.

    CI

    ANCHORBRADFORD197737.99Text Comments made by This User

    DICTIONARY0

    ROW 1ROW2ROW3

    0[3].56<null>

    [4]LEY2005[0]46.97[0] Additional Comments made by This User

    <null><null>0

    So if you have a page where you cannot get any savings from Page Column Prefix or Page Dictionary Compression, then your records will be row compressed and stop there.

    It really is determined by what the data is on the page, and if you can get compression savings. SQL tries to keep you from having the additional overhead of Page Compression if there is no benefit to it.

    The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉

    Peter Trast
    Microsoft Certified ...(insert many literal strings here)
    Microsoft Design Architect with Alexander Open Systems

  • I hear you, there isn't really a published algorithm.

    Because of that it has the chance of changing in SP's, CU's, or versions. You know how it is. But you can use DMV's & Extended Events to track the Compression changes to a Database.

    It is a bit misleading though, because it lables pages as compression attempt failed, but Row compression didn't fail, Page Compression (Column Prefix & Page Dictionary) did.

    I'm working on getting some numbers based of a couple tests I'm doing for a Deep Dive Presentation on Compression.

    It could always change, but if I can get a baseline I'll have something to compare it against.

  • Excellent question today.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice easy Monday question. 😎

  • Peter Trast (5/16/2011)


    The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉

    The threshold used to be that (a) only full pages are considered, and (b) the compressed page should be able to store 5 more rows, or 25% more rows, whichever is greater, compared with the original page. They might have changed, however (that's the way with undocumented magic numbers).

    The rules are generally not very sophisticated because sophistication costs. The query optimizer for example) has lots of very simple transition levels, and very little genuine magic 🙂

  • SQLkiwi (5/16/2011)


    Peter Trast (5/16/2011)


    The explanation makes plenty of sense. I was just wondering what the threshold is. "Significant" is subjective. I was hoping for a percentage or formula or something like that. I would assume that it is based on many things, not just the net amount of space saved but the cost of maintaining the compression, etc. Perhaps, like the query optimizer, it is just "magic" 😉

    The threshold used to be that (a) only full pages are considered, and (b) the compressed page should be able to store 5 more rows, or 25% more rows, whichever is greater, compared with the original page. They might have changed, however (that's the way with undocumented magic numbers).

    The rules are generally not very sophisticated because sophistication costs. The query optimizer for example) has lots of very simple transition levels, and very little genuine magic 🙂

    I like the fact that you say "very little" when speaking of magic. I KNEW there had to be SOME 😉

    Peter Trast
    Microsoft Certified ...(insert many literal strings here)
    Microsoft Design Architect with Alexander Open Systems

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply