November 25, 2016 at 1:09 am
Hi!
The Question:
Is it a good or bad idea to use Clustered Index on a datetimeoffset(7) on the Created Column that is always increasing???
Background and more info:
I haven't been dealing with indexes much last year so I feel a bit rusty, hence my question.
We have a table where the web application stores it's actions. During a request there can be as much as 30 log entries stored in a batch. This log file has Id as primary key (with all that comes as default including a clustured index). It also has a Created column as a datetimeoffset(7).
When a user wants to search for something in that log they almost always use a filter on the Created column. (They also use wildcard search on a nvarchar(?) but thats not the problem in this question. I hope...)
As I remember the facts
* Only one Clustured Index on one table (Resulting removing the index on the PK, which is never used)
* Clustured Index is not a good choice when dealing with types where sql needs to insert "in the middle". Sequence works better. (Guid is therefore not good)
* Clustured Index is a good choice when dealing whith small datatypes (Guid is not good, but is datetimeoffset(7) too big to?
Thanks!
// Ety
November 25, 2016 at 2:11 am
I would think very carefully about this before implementing
😎
1) Will there be more indices on the table than the clustered index?
2) Will the clustered index key be foreign key in other tables?
3) What is the expected cardinality (row count)?
The DATETIMEOFFSET(7) is 10 bytes in size, BIGINT is 8 bytes and INT is 4 bytes. As the clustered index key is part of any index on the table, then if there are any other indices than the clustered index on the table then consider changing the clustered index to INT which will save 6 bytes for each entry in each index. If the cardinality is higher than 2^31 - 1, then use bigint which will save two bytes for each entry in each index.
Another consideration is the possibility of duplicate entries, even at the precision of 100ns, it is quite likely that two entries will be the same, both if the value is either calculated outside the table or set as a default constraint.
November 25, 2016 at 9:58 am
When a user wants to search for something in that log they almost always use a filter on the Created column. ...
* Clustured Index is a good choice when dealing whith small datatypes (Guid is not good, but is datetimeoffset(7) too big to?
Don't let general guidelines cause you bad performance in specific situations. That is, what you listed are valid clustered index guidelines, in a very general sense, but what's most critical for performance is getting the best clustered index on every (significant) table.
Based on user searches, yes, your table should be clustered first on Created and not on id. The length of the key is relatively immaterial since you'll almost always be searching by that value anyway. I have tables with five clustering key columns that are lightening quick because those tables are always read by those keys.
The very idea of a "default" clustering on identity column is a destructive myth. For you own sake, ignore that myth! Because the clus index is so vital to performance, there is no such thing as a "default" clustering on a table! Never just lazily slap an identity column on every table and cluster on it. This also often harms data integrity, particularly in intersection tables, because the proper keys and key restrictions are not placed on tables, the identity column is used instead.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 27, 2016 at 2:47 pm
One of the biggest questions here is - how do you query the table?
Can you post one of the typical queries which use a filter on that datetimeoffset column?
_____________
Code for TallyGenerator
November 28, 2016 at 6:53 am
Hi!
Unfortunately I can't. We're still designing. But we will propably have a wildcard search on a nvarchar(250) and the created column. Perhaps Created > @startdate and Created < @enddate.
Br
// Ety
November 30, 2016 at 8:06 am
Make sure you understand the big picture of your system. If you focus on fixing one symptom, you'll frequently cause other problems.
Another factor in clustered index design is the expected IUD workload of the table.
What are the performance expectations and volume of inserts, updates and deletes?
- Are you primarily doing inserts? Are there any updates/deletes?
How many non-clustered indexes do you have?
What are the performance expectations, volume and scope of queries?
- Scope meaning are queries using fresh data or are they using primarily stale, unchanging data.
Determine your usage scenario and then post back here. You'll be able to get some good design advice here once we better understand your need.
Wes
(A solid design is always preferable to a creative workaround)
November 30, 2016 at 8:28 am
I often use a datetime stamp as the leading column in a clustered and/or primary key. Many types of entities are naturally unique within the context of time, and also if querying by date/time is a common access pattern for the table, then that's perfect.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
October 10, 2017 at 12:32 pm
I realize I'm late to the party here, but for future finders of this thread, bear in mind that DATETIMEOFFSET only takes 10 bytes if you have a precision between 7 and 5. If you don't need better than hundredths-of-a-second precision, use DATETIMEOFFSET(2) and it's only 8 bytes. We also have gone to using this precision for DATETIME2, since that gets it down to just 6 bytes.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply