Columnstore or not to columnstore?

  • I have to agree with Frederico.   I've had both methods be better than the other.  "It Depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    I have to agree with Frederico.   I've had both methods be better than the other.  "It Depends".

    Yep.

    There are trends. More often than not, I see having a clustered index on the load (if one is needed, again, it depends) work better than adding it later. However, I've seen the reverse on many occasions. So, as was stated early in this thread, test it. Test it. Test it.

    Don't trust what some idiot with a keyboard (Hi, my name is Grant) says on the internet. Test it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    Jeff Moden wrote:

    I have to agree with Frederico.   I've had both methods be better than the other.  "It Depends".

    Yep.

    There are trends. More often than not, I see having a clustered index on the load (if one is needed, again, it depends) work better than adding it later. However, I've seen the reverse on many occasions. So, as was stated early in this thread, test it. Test it. Test it.

    Don't trust what some idiot with a keyboard (Hi, my name is Grant) says on the internet. Test it.

    Heh... You're not alone there.. I'm the same kind of idiot (Hi, my name is Jeff).  I used to "absolutely" agree with loading a table that has the clustered index in place as being an "always".  One day, I forgot to add the clustered index on a very large load and added it later and realized that particular combination was about 4 times faster for what I was doing than having the clustered index in place.  It was another of those moments that reminded me that you always need to test different options because SQL Server is a fickle partner.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher wrote:

    No, it won't use 9 bytes under row compression unless it truly needs all 9 bytes.  Row compression should automatically be used for most tables.

    While row compression can certainly be beneficial, I wouldn't say that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    No, it won't use 9 bytes under row compression unless it truly needs all 9 bytes.  Row compression should automatically be used for most tables.

    While row compression can certainly be beneficial, I wouldn't say that.

    MS effectively says that.  And I agree with them on this one.  That is, I'd need a specific reason to exclude a table from row compression rather than a specific reason to activate it.  Not much downside unless you UPDATE the row using wildly different values than the original values.  In business apps, that's very rare.

    If you think about it, varchar() is a form of row compression as well.  Presumably we all use that :-).

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I can't find right now the newer link for row compression recommendation, but here's the old one.  Of course, if your system is CPU bound, rather than I/O bound, you would be much more conservative with even row compression.

    There are some tables I don't row compress, but it's rare and for a specific reason.  In particular, if the are lots of bit columns (row compression actually makes each bit expand to 4 bits).

    Other gains: RCSI versions of rows are row compressed (but are not page compressed, which is logical).  Trans log data is also row compressed (last I confirmed it, in SQL 2012; I don't see why that would change, but I guess you never know).

    Finally, I'll note in passing that Glenn Berry and Paul Randal were both reviewers of the document below before it was published.

    https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd894051(v=sql.100)

    [The 10% is for writing data; reading row compressed data has virtually no noticeable CPU overhead.]

    "

    The CPU overhead of row compression is usually minimal (generally less than or equal to 10 percent in our experience). If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, all data should be row-compressed. For example, SAP ERP NetWeaver 7.00 Business Suite 7 and above use row compression on all tables.

    "

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You're definitely preaching to the choir.  What I'm worried about is the wording you used.

    Row compression should automatically be used for most tables.

    "Should automatically be used" is all that most people will see and practice.  As you've pointed out in your posts above, there are considerations (even more than what you were good enough to post above that I'm still researching/testing the impact of) and it shouldn't just be an "automatic" thing to do.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My view is that the vast majority of the time row comp is beneficial, likely 90+%. So I think they're better off always doing it vs never doing it, which I think will be their choice if it becomes too complex to decide whether to rc or now.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    My view is that the vast majority of the time row comp is beneficial, likely 90+%. So I think they're better off always doing it vs never doing it, which I think will be their choice if it becomes too complex to decide whether to rc or now.

    Heh... I know that's your view.  It don't share that view though.  It's the 10% that can be the killer.  I've just not finished the testing I'm doing to figure out what the impact of my concerns actually are.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • VoldemarG wrote:

    Yes, am engaging right now into doing the test you mentioned.

    thank you.

    I was just attaching the Excel file to answer Jeff's questions.

    The testing is a great idea but I'll add another permutation that can be extremely beneficial, especially if the database can temporarily be slipped into the BULK LOGGED Recovery Model.  For the times that you TRUNCATE and reload the table, use WITH(TABLOCK) on the insert whether or not the Clustered Index is in place already or not.  It can help in the FULL Recovery Model but will really help in the BULK LOGGED Recovery Model especially if "Minimal Logging" occurs.

    There is a catch to "Minimal Logging".  You cannot do a restore to the middle of a log backup that has even 1 byte of "Minimally Logged" activity.  So I recommend that you first force a log file backup, then do the "Minimally Logged" insert (Bulk Logged), then make sure you change back to the FULL Recovery Model, and immediately take another log file backup.

    As a bit of a sidebar, this isn't a very large table.  You can easily make it so that you have real close to zero downtime while you're loading it using the "Load and rename" method.  You keep one copy of the table active (using a synonym) while you're loading the other (also using a synonym).  If the load is successful, just repoint the synonyms.  If the load is not successful, you're not dead.  The data might be a day behind but it's still there... just don't repoint the synonyms on load failures.

    Thanks for the comprehensive information in the spreadsheet.  I've not had much time to look at it but I'm getting there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply