Columnstore index question

  • I have some large ETL tables that i've created to snapshot production OLTP data that is sourced from more than one server and product. 

    The tables are wide, about 70 columns, and queried mostly by the date the rows were created in the warehouse, what i call its snapshot date.  Each table has millions of rows. 

    Would a column store index on the most commonly queried column, that of snapshot date, help performance?  Most of the queries are just returning large chunks of the result table into spreadsheets.

  • Column Store Index is on all columns, not just a single column, and a date column isn't really that good of a key for a column store because it has many unique values.

    I'm guessing your best bet would be to remove any columns that are unnecessary and partition by your date key (year or year/month).

  • It would very likely help performance.  Certainly if:

    1) you created a nonclustered columnstore index that covered the query entirely.
    Or
    2) the total rows you're retrieving were not a significant enough percentage of the table

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Cool, i think its worth testing out then. Is there any formula to calculate how much more memory the BI box may need to implement this on the big ones?   like width of columns by # of rows etc?

  • Seeing about a 40% increase in performance on a larger dataset, smaller ones are actually running a bit slower, but the gain on the larger ones may outweigh that effect. 

    Interesting exercise.

  • ManicStar - Wednesday, October 11, 2017 11:09 AM

    Seeing about a 40% increase in performance on a larger dataset, smaller ones are actually running a bit slower, but the gain on the larger ones may outweigh that effect. 

    Interesting exercise.

    Add OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) to the queries which run slower and have the best of both worlds.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Interesting on the smaller ones.  Probably just SQL generating a non-optimal query plan.  You can probably resolve that; worst case, you'd have to use a "hint" to force SQL to get the best plan.

    But overall it'd probably be more useful to review indexing across the board to check for possible improvements, including looking at missing index stats, index usage stats, index op stats, and cardinality of values, as the most important factors.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher - Wednesday, October 11, 2017 11:24 AM

    Interesting on the smaller ones.  Probably just SQL generating a non-optimal query plan.  You can probably resolve that; worst case, you'd have to use a "hint" to force SQL to get the best plan.

    But overall it'd probably be more useful to review indexing across the board to check for possible improvements, including looking at missing index stats, index usage stats, index op stats, and cardinality of values, as the most important factors.

    Removed the rowstore non-clustered indexes for this exercise. Gunna put one of the back as another experiment.

  • ManicStar, thanks for posting your findings.  Good info.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply