April 11, 2020 at 8:04 am
Hi,
I've got a table with 110M+ rows. It is an SCD2 table with historic data. Most of the queries return the bulk of the table, say 50% or more of the total rows, eg. where current_record='Y'. It is only updated every 6 weeks, and those updates can add 10M or so new rows.
It was under-indexed and suffering from poor performance, so I've added non-clustered rowstore indexes to support the most common queries (IX_* in the picture below).
However, I also note https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-design-guidance?view=sql-server-ver15:
Clustered columnstore index
Use for:
1) Traditional data warehouse workload with a star or snowflake schema
2) Internet of Things (IOT) workloads that insert large volumes of data with minimal updates and deletes.
This is very much the use case of this table.
So, I've converted it to clustered columnstore index and am pleased with the results.
Here is my current structure:
The PK is a unique non-clustered constraint, not a primary key. The FK can't be in place when I created the Clustered Columnstore (from the UI), but can be added after the fact (not sure why???).
Questions:
ALTER INDEX CIX_RLDX_ARCHIVE_ColumnStore ON new.RLDX_ARCHIVE DISABLE
GO
SELECT TOP 1000 * FROM new.RLDX_ARCHIVE
And got this:
Warning: Foreign key 'FK_RLDX_ARCHIVE_RLDX_AUDIT_TRAIL_rldx_audit_key' on table 'RLDX_ARCHIVE' referencing table 'RLDX_AUDIT_TRAIL' was disabled as a result of disabling the index 'CIX_RLDX_ARCHIVE_ColumnStore'.
Warning: Index 'PK_RLDX_ARCHIVE_rldx_archive_key' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Warning: Index 'IX_RLDX_ARCHIVE_NaturalKeys' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Warning: Index 'IX_RLDX_ARCHIVE_hospital_type' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Warning: Index 'IX_RLDX_ARCHIVE_recnum' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Warning: Index 'IX_RLDX_ARCHIVE_rldx_audit_key' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Warning: Index 'IX_RLDX_ARCHIVE_rldx_current_record' on table 'RLDX_ARCHIVE' was disabled as a result of disabling the clustered index on the table.
Msg 8655, Level 16, State 1, Line 4
The query processor is unable to produce a plan because the index 'CIX_RLDX_ARCHIVE_ColumnStore' on table or view 'RLDX_ARCHIVE' is disabled.
So disabling the Clustered Columnstore index isn't the correct approach.
Wrapping up, if I want to use a Clustered Columnstore index (and I do, the performance gains are amazing for the queries I run), then what are:
1) best practices for other indexes (of any type), and
2) any "gotchas" in large ETL insert/updates?
Thanks,
Scott
April 11, 2020 at 3:33 pm
- Is it now over-indexed? All of the rowstore indexes would be used in typical queries, if there wasn't a Clustered Columnstore index, but are they necessary given the Clustered Columnstore index?
I think rowstore indexes would not be required with Clustered Columnstore Index. However, I've not tried this yet. This needs to be authenticated.
2. In the past I "played" with Clustered Columnstore indexes, and suffered atrocious performance during updates (eg. inserting 10M rows then closing out the old expired rows). Should I disable ALL indexes during updates, then rebuild after the update? Or just disable the non-clustered indexes (but see #1)? Given the infrequent updates to this table, I'm happy for the performance hit of rebuilding all indexes if it results in faster queries for the next 6 weeks (rebuild all takes about 20 mins).
3. I also note this thread: https://www.sqlservercentral.com/forums/topic/disabling-indexes-for-load-performance. See Jeff Moden's comment. And yeah, I tried:And got this:So disabling the Clustered Columnstore index isn't the correct approach.
Columnstore Indexes out-performs as compared to Rowstore Indexes because :
Compression is an overhead during WRITE operation whereas helps in READ operation.
If huge data is being deleted or updated, then you can try DROPPING the clustered columnstore index before DELETE/UPDATE, and CREATING the clustered columnstore index after DELETE/UPDATE.
However, you need to compare the time taken to REBUILD vs DROP/CREATE the clustered columnstore index.
Wrapping up, if I want to use a Clustered Columnstore index (and I do, the performance gains are amazing for the queries I run), then what are:
1) best practices for other indexes (of any type), and
2) any "gotchas" in large ETL insert/updates?
You can give a try to the Get started with Columnstore for real-time operational analytics article available on Microsoft official documentation.
I've also recently wrote an article Probing Columnstore Indexes, in relation to use of columnstore index in OLTP workload. You can have a look into it. I'll hope you'll find it purposeful!
April 13, 2020 at 10:51 am
So, before I comment on anything, remember, testing is your best friend in these situations. Anyone can have opinions on behavior. However, you should test those opinions and assumptions against your system in order to validate them.
HOWEVER, we're talking about a clustered index, so, see the next point.
With modern systems we get to pick & choose behaviors. So, if the predominant access of the data is analytical, warehouse, reporting, style queries, a clustered columnstore makes sense. Then, as needed, you can add nonclustered indexes for point lookups. If, on the other hand, the predominant access of the data is more OLTP, small batch insert/update/delete, lots of point lookups, then you use a clustered rowstore index. And, as before, add nonclustered indexes for point lookups. And, as needed, add a nonclustered columnstore for analytics. You can combine them, mix and match. Just understand your data access patterns so that you can choose the right indexes.
In terms of "over indexing", I wouldn't sweat that unless you're not testing and monitoring. Then, yeah, worry about that until you get testing and monitoring set up. For detailed query analysis, Extended Events are your best buddy. For general analysis, you can rely on the DMVs, or enable Query Store. In addition to these, I recommend getting a third party tool for monitoring, not because you couldn't do it yourself, but because you have better places to spend your time than building a monitoring tool.
Hope all that helps. For a lot more details on query tuning, I'd suggest getting copies of my two books, below are the links.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 13, 2020 at 10:56 am
I think rowstore indexes would not be required with Clustered Columnstore Index. However, I've not tried this yet. This needs to be authenticated.
I'd be really cautious here. You can combine clustered columnstore and nonclustered indexes. The idea is simple. Columnstore indexes are excellent at analytical style queries, but they stink at point lookup queries (single row or small batches). So, when you need the point lookups, you can add the nonclustered indexes.
If huge data is being deleted or updated, then you can try DROPPING the clustered columnstore index before DELETE/UPDATE, and CREATING the clustered columnstore index after DELETE/UPDATE.
However, you need to compare the time taken to REBUILD vs DROP/CREATE the clustered columnstore index.
Again, a note of caution. Since a clustered columnstore defines data storage, to a degree, dropping and recreating it means we're literally moving the data twice, once into a heap and then into the clustered columnstore. Chances are very high that's a much more costly operation. Considerations around the nonclustered indexes just goes back to testing.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2020 at 3:06 am
Hey Grant,
Thanks for taking the time to write your detailed reply.
The single biggest thing I got from your post is to test, test, test. I do know I'm getting MUCH better performance from my typical queries and joins than without the clustered columnstore index. I threw a number of the typical queries against this table and the query execution plan almost always chose the CCI, even when there was a rowstore index that could have also been used. I'm guessing this was due to the quantity of rows being returned (50% or more). The only time the rowstore index was used was when I queried a specific natural key: hospital='AAA' and stay_number='123' and episode_sequence_number=1. This is actually not a typical query, but might be used to drilldown and debug coding errors.
I did test the ETL *for this particular table*, which added about 2M rows to the 110M row table. The performance when leaving all indexes in place was acceptable. But I'll keep in mind your general advice should I have ETL (insert/update) performance issues in the future.
BTW, I *do* have your book: SQL Server Query Performance Tuning, Fourth Edition from Apress. I've only read about 1/4, then got OBE ("overcome by life events"). Seems I should revisit it. I've also read Nico's blog, and even some of his YouTube videos. Some of it went over my head at the time; again, perhaps time to re-read his blog.
April 21, 2020 at 11:09 am
It makes sense that the columnstore index is going to get used for queries that are moving large chunks of data. That's pretty much why they exist. If testing shows it's helping, then it's helping. That's the only measure that counts. Not my opinion or anyone else's, the test.
Happy to help a little. Good luck going forward.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2020 at 9:09 pm
1 The nonclus indexes should be unique, not non-unique. Worst case, add an identity to the table and use that to make unique non-clus keys.
2 Are the nonclus indexes page compressed? If not, look into that immediately. Because they can be compressed, I wouldn't worry about being over-indexed, at least not for now. Get it performing well, later you can see if you can remove / combine an index(es).
3 Make sure the parallel threshold setting is above the minimum setting. The best number depends on your specific system, but if it's still at a super-low number like 5, that's not good and needs corrected.
4 Updating a columnstore clustered index can cause performance pangs. You especially don't want to increase the size of a lot of different columns (that can be an even bigger issue with rowstore tables).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply