July 22, 2014 at 3:41 pm
Hi All
SQL 2008R2
I have a non-clustered, Composite Primary Key. It is the natural key from the business domain and is not suitable as a clustering key. However, a number of queries do depend on the columns for either where or join conditions.
The problem is the Primary Key is not "Covering" for most of those queries so the execution plans have lots of Key Lookups (Mostly using Nested Loop joins).
I am looking for some way to add "Include" columns to the Primary Key but since it is actually a constraint there does not seem to be any way to do that.
Does anyone know of a way to accomplish this short of just creating a duplicate, non-clustered index with the included columns?
Thanks
July 23, 2014 at 3:54 am
You can't do an include on a constraint. Your only real alternative would be to recreate the index as just a unique, non-clustered index. Then you could do includes. You can still do foreign key references if it's a unique index, so it shouldn't prevent most functionality. I'm not sure that would be my preferred solution though. If most queries are using this, whether it's the perfect candidate for a clustered index or not, may you should cluster on it. Better to suffer some fragmentation and slightly fat keys while receiving the benefits of the clustered index rather than having to hop through hoops to go the other way.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 23, 2014 at 5:31 am
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.
July 23, 2014 at 5:39 am
Brandie Tarvin (7/23/2014)
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.
I disagree.
😛
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 23, 2014 at 5:40 am
Grant Fritchey (7/23/2014)
Brandie Tarvin (7/23/2014)
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.I disagree.
😛
No ice cream for you!!!! @=)
July 23, 2014 at 6:01 am
Brandie Tarvin (7/23/2014)
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.
What do you mean by "adding that [IDENTITY column] to the natural composite PK"? If you have a composite key of (A,B) would you really add an IDENTITY column (idn) and make the clustered primary key (idn,A,B) or (A,B,idn)? That's how I read what you've written but I'm guessing you probably mean something completely different.
July 23, 2014 at 6:10 am
sqlvogel (7/23/2014)
Brandie Tarvin (7/23/2014)
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.What do you mean by "adding that [IDENTITY column] to the natural composite PK"? If you have a composite key of (A,B) would you really add an IDENTITY column (idn) and make the clustered primary key (idn,A,B) or (A,B,idn)? That's how I read what you've written but I'm guessing you probably mean something completely different.
I mean add a new column called MyTableID (alter the "MyTable" part to the table name or a user-friendly description that tells everyone what it is). Then cluster the PK on A,B,MyTableID. You don't want to list MyTableID first because none of your code will reference it.
EDIT: I should have added the below...
After you have a clustered index, you can then add non-clustered covering indexes on the columns you want to covered. According to one SQL Saturday coach, the non-clustered index will auto-include a clustered primary key as a "silent" addition to non-clustered index leafs. It's been over a year since I attended that session, so you might want to research that yourself to make sure I'm not passing along myth-information.
July 23, 2014 at 6:41 am
Brandie Tarvin (7/23/2014)I mean add a new column called MyTableID (alter the "MyTable" part to the table name or a user-friendly description that tells everyone what it is). Then cluster the PK on A,B,MyTableID. You don't want to list MyTableID first because none of your code will reference it.
That doesn't seem like a great idea. Your "primary key" constraint does not enforce the uniqueness of (A,B) so you would presumably still need the nonclustered uniqueness constraint on (A,B) and adding an IDENTITY column to the end of the key isn't going to make it any more or less suitable as a candidate for clustering. The uniqueness of your TableID isn't guaranteed either and it cannot be referenced by any foreign key unless you add a unique index on it so I don't see what purpose the IDENTITY column is supposed to serve here. You said, correctly, that the cluster key columns are included in any nonclustered index anyway but I don't see how that helps if you are needlessly populating all your indexes with an extra column.
There are actually two technical limitations touched on by the OP's question. One is that not all the indexing options are available through the UNIQUE / PRIMARY KEY syntax (e.g. included columns, filtered indexes). It would be nice to have the option to create a UNIQUE / PRIMARY KEY constraint and optionally and separately create indexes to support it - i.e. remove the limitation whereby indexes have to be automatically created when you create a constraint.
The second limitation is that SQL Server only allows one "clustered index" per table and makes bookmark lookups against that index when a nonclusterd index doesn't cover the query. As has been discussed before, if we had the option to create multiple clustered indexes and/or use a different key for lookups then that limitation would go away. (Historical note: as I recall in MSSQL 6.5 (?) RIDs were used as row locators in nonclustered indexes instead of the cluster key but I may have that wrong and be thinking back to Sybase days.)
July 23, 2014 at 7:13 am
Brandie Tarvin (7/23/2014)
Grant Fritchey (7/23/2014)
Brandie Tarvin (7/23/2014)
There are people who will disagree with me on this, but I have found adding an IDENTITY (1,1) column to the table and adding that to the natural composite PK has been a useful method of creating a unique (and clusterable) PK.I disagree.
😛
No ice cream for you!!!! @=)
I disagree too!!!!! ;-). You should hand over your ice-cream to Grant :-D.
If you add the identity in on the front of the PK the PK index is now useless, unless you do massive changes and make everything thatuses the natural key include this new identity, which will require a separate index on the natural primary key to allow you to find this identity to use it in the query you now have to riun instead of the one you wanted to run. If you add it anywhere else it doesn't make the key usefully clusterable.
If you want to add an identity column, make it the clustering key on its own (so declare it as "id int identity unique clustered"); then it can be used as the target of a foreign key relationship if need be but you are not forced to use it in queries if you don't want to, and it ensures the table doesn't suffer the disadvantages of a heap. The natural primary key can be declared as primary key nonclustered. That's a possible alternative to what Grant suggests, whether it's better or worse depends on the data, how much fragmentation the heap would suffer, and what the queries are.
Tom
July 23, 2014 at 7:56 am
Not to provoke debate (but I'm gonna anyway @=), I generally use surrogate Identity columns as singular, non-composite PKs because if I used natural composite keys, I'd be including half the columns in most of our tables.
Yes, some of our databases really are that "well designed."
But on blue-moon occasions, I do make clustered composite PKs with an Identity column because it works for our environment. What works for our environment, though, is not something that will work for everyone. I've learned over the years that "best practices" sometimes fly out the window because of "well designed" databases, if you know what I mean.
"Every exception has a rule, even if every rule does not have an exception."
July 23, 2014 at 8:45 am
I see what everyone is saying and I don't disagree with them, but I know where Brandie is coming from. You make compromises in order to make things work well. In a perfect world we'd have nothing but perfect designs running perfect code and that would be awesome. Meanwhile, back on planet Dirt, you do what you need to do to get through the day successfully. Aim for the perfect solution, deal with the fact that you're probably not going to get there.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 23, 2014 at 9:30 am
Grant Fritchey (7/23/2014)
I see what everyone is saying and I don't disagree with them, but I know where Brandie is coming from. You make compromises in order to make things work well. In a perfect world we'd have nothing but perfect designs running perfect code and that would be awesome. Meanwhile, back on planet Dirt, you do what you need to do to get through the day successfully. Aim for the perfect solution, deal with the fact that you're probably not going to get there.
I understand what compromises are. What I don't understand is what Brandie expects to gain by adding an IDENTITY column at the end of a unique clustered index. I'd hoped she might care to explain that.
July 23, 2014 at 10:04 am
Hi All,
Thanks for the many responses. I am surprised that so many of you stayed nearly on topic.
I would like to make a few general observations and comments but first a little additional information.
The table is part of a Date-Based, Sliding Window, Partitioned View that MUST be updateable. The PV gets ~90Mrows per week (~15GB of data). The Clustering Key is a Non-Unique, but constantly increasing Datetime (basically the received DateTime) that is both useful from the Business Side and provides pretty good distribution. I have two, fairly narrow NC indexes one is the Primary Key ( 3 columns, 13 bytes + clustering key) the other is 14 bytes + clustering key. The NC indexes add about 8GB to one week of data. We keep 14 weeks (one quarter) of data online so the aggregate is on the order of 300GB and 1.4B rows. I implemented the PV so I could get rid of old data in a rational manner :).
Anyway, based on those requirements Identity, TimeStamp, etc. are out for column types since they cannot be in an updateable view. Also, creating additional indexes is a dicey prospect. Even a narrow index adds ~3GB per table or ~40GB to the aggregate.
Multiple clustered indexes is one of those great concepts like perpetual motion, rainbow gold, and stable release:). If you think about it you can accomplish the same thing by simply creating a NC index on all of the table columns. However, cheap as disk space is I cannot afford to double the size of this 14 table set and the 3 other similar sets I have. I would be approaching 1TB for a single table!.
It would really be nice if the Primary Key and Unique Constraint syntax could be extended to support Included Columns. I don't think that raises any relational theory issues. I am not sure you can reconcile the concept of a primary key with a filtered index. By definition the Primary Key is unique for every row so there is not much filtering 🙂
I had briefly considered the Unique Index but I rejected it because I thought it would not get the same treatment in joins. I will have to reconsider that option and try a couple of tests.
Again thanks for the thoughtful replies.
July 23, 2014 at 10:37 am
It seems you do indeed have the proper clustering key.
Next I'd look at the missing index stats from SQL. You may be able to cover a large % of queries with relatively few INCLUDEd columns; if so, great, do that. In general, consolidate covering indexes when you can. Yes, the rows are wider, but the data is shared between queries too.
Even then, you'll of course still have the issue of queries that use a non-clus key but aren't covered by the non-clus index. Those will take further review.
Btw, if you INCLUDE enough columns in a non-clustered index, you have effectively created a second clustered index, but, as you noted, with the increased disk space usage that requires.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 25, 2014 at 12:05 pm
Brandie Tarvin (7/23/2014)
Not to provoke debate (but I'm gonna anyway @=), I generally use surrogate Identity columns as singular, non-composite PKs because if I used natural composite keys, I'd be including half the columns in most of our tables.
So do I, if I'm starting from scratch. But I will also have a uniqueness constraint for the natural key and non-null constraints for each of its columns, simply because I want to be in a decent normal form (actually to be 1NF according to the original definition of a primary key).
If however I'm starting from an existing database that doesn't use a surrogate key and has problems because there is nothing suitable for a clustering key I will introduce a unique identity as a clustering column, to avoid either using a bad clustering key or living with a heap. That doesn't change the primary key.
If there are size problems caused by using the natural key in forign key constraints and in join conditions, that is a separate problem, which needs the same unique identity and also some code and constraint changes. But it mayalso need to avoid clustering on the identity instead of on the natural key if the workload consists mainly of queries which operate on contiguous ranges of the natural key. UNfortunately I can't always win both on the swings and on the roundabouts.
Tom
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply