July 22, 2020 at 1:17 pm
We have a table definition below with a columnstore index and nonclustered row indices. The data in this table increases 10K per day and in a year can have 400K ColNumber with each having 1 to many ColProperty. This table will be used mostly for reads/reference and hardly have updates. Would it be beneficial to have the nonclustered row indices as well or just the columnstore index would be sufficient?
CREATE TABLE [dbo].[SimpleTable](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ColNumber] [nvarchar](50) NOT NULL,
[ColProperty] [nvarchar](50) NULL,
[ColValue] [nvarchar](150) NULL,
[ColDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED COLUMNSTORE INDEX CCIX_SimpleTable ON [dbo].[SimpleTable];
CREATE UNIQUE INDEX IX_SimpleTable_NumberProperty ON [dbo].[SimpleTable] (ColNumber,ColProperty);
CREATE NONCLUSTERED INDEX IX_SimpleTable_Id ON [dbo].[SimpleTable] (Id);
July 22, 2020 at 2:32 pm
My understanding of clustered columnstore indexes is that they are not ordered (except in Azure). So, my opinion, your non-clustered indexes (the UNIQUE and the NONCLUSTERED are both NONCLUSTERED) may help or may not help.
Having the index on ID may help if you are joining on that column or filtering on that column, but might not help either.
It depends on the queries being run against the table as to which indexes you will want. For example, if you have a lot of "WHERE id <= 10" or similar type queries, then IX_SimpleTable_id will be beneficial. But if you never use that when filtering data, that index will be wasted. Same thing with the IX_SimpleTable_NumberProperty. If you filter on ColNumber and/or ColProperty, then that index will be helpful.
Now if you are generally pulling in all of the data and doing analytics on it from the application side (SAS, Tableu, PowerBI, etc), then your nonclustered indexes are not really going to be too helpful and will slow down your inserts.
As a thought though - since ColNumber and ColProperty are required to be unique, that may be a good candidate for a primary key. You may not need that ID column if it is just working as an unreliable row counter.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
July 22, 2020 at 3:20 pm
+1 for Brian's thoughts. The indexing is driven by the query patterns.
July 23, 2020 at 12:12 pm
The key here is the types of queries, not simply should I throw an index on. Columnstore indexes are meant for analytical, aggregation & large scan types of queries. That's where they shine. B-tree indexes are meant for point lookups and limited scans. That's where they shine. You only put a nonclustered index on your clustered columnstore index if you need to add point lookups to the otherwise predominant query patterns of aggregation, etc. If your queries are still predominantly point lookups, you shouldn't be using columnstore at all.
Based on the info provided in your question, there's no way for us to give you a definitive answer.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 27, 2020 at 6:22 pm
Thank you all for your responses.
The queries to this table will typically be to get the value of a particular property for ColNumber within the last x months or get all the ColNumber of a certain property value. From all your feedback, the B-tree indexes sounds like it would fit more in this type of queries.
July 28, 2020 at 10:40 am
OK. So, with ColNumber as the primary means of accessing the data, probably, and you should test this, not just trust some yahoo on the internet, that's your clustered key column on a b-tree clustered index. Based on your described data access pattern, that's where I'd lean. Traditionally, clustered keys on unique values is preferred. However, since the clustered index defines data storage, I lean towards letting the most common access path to the data define my clustered key. Frequently, the most common path is the primary key, so it gets clustered too. However, sometimes, like this sounds, it's better to go with a non-unique column, or columns, as the key.
BUT
As I said, testing is your friend.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply