Put clustered index on 8 Column Natural Key, or on Identity Key

Question

Put clustered index on 8 Column Natural Key, or on Identity Key

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101103 More actions · Answer 1

BSharbo (8/7/2015)
rmechaber (8/4/2015)
Setting aside the question of clustering key for the moment, which others here have already taken up, I would ask whether your assumption of having an 8-column natural key is, in fact, correct. In addition, you stated that you have 50 columns in the table and, further on, that there is only the one table in your database.
Without knowing more about the real-world data you're trying to model in your design it's impossible to say, but that sounds to me like you've probably violated basic design practice. Not that an 8-column key is impossible or that a 50-column table is wrong, but you yourself say that you are a beginner at database design, and this is an area that it's common for beginners to walk astray.
In the example you gave about loan numbers/providers/clients, it looks like the three columns of loan number, client, and pool -- which you yourself describe as "A level lower than client that makes loan number distinct" -- are sufficient to identify a unique row in the table and that everything else depends on them. In other words, they constitute a natural primary key.
Things like LoadedDate are facts about the row but are not necessary to making the row unique (and therefore should not be part of a primary key).
Likewise, if city and state are aspects of the client, then city and state probably belong in a Clients table, with a foreign key to your Loans table. If city and state are, say, the location of the asset for which the loan was made, then you might want to consider an Assets table, where you can add multiple columns about the asset (type: car, home, boat, etc.; city; state; etc.).
If you can post an actual list of column names and describe what you're actually trying to model, you might get a lot more useful information back about different ways you might get to set this up.
Fortunately, you started at the beginning: how best (and "best" can be different things) to set up a database. It's a lot easier to create a better database if it holds no data yet!
Rich
Hey Rich. I appreciate the thought, but the 8 columns ARE necessarily the key to the table. I do concur that I could easily split the table up into multiple tables, however I don't see the benefit honestly.
What is happening is a client is supplying a daily file with 50+ columns. The key to each record is is "loan number" "module" "datamonth" "Review week" "loan servicer" "client Name" "pool" "Loaded Date".
I have been doing quite a bit of testing, and I tried the following two methodolgies..
Table has 13,046,000 + records.
Method #1:
Put a primary key clustered Index on the 8 key columns. Run queries where these columns are pulled in the where clause.
QUery takes 1 minute 40 seconds.
Method #2:
Put a primary key NONclustered Index on the 8 key columns. Add an auto-incremented column which then has a clustered index on it. Run the exact same query as above.
QUery took over 14 minutes before I killed it.
It seems like for my specific needs, method #1 is working over 10 times faster, even though it is a method which every Database designer would disagree with... And now I am slightly more confused haha 🙂

Not every database designer/modeler automatically rejects multiple clustering keys or primary keys. Please don't subscribe that knee-jerk reaction to all of us. The clustered index is the most critical element for table performance, and thus should always be carefully chosen based on actual data needs (and not on a nursery-rhyme-like saying: "narrow, ever-increasing, ...").

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

TomThomson SSC Guru Points: 104773 More actions · Answer 2

To me your actual data example and the query and its performance seem to indicate that you need to cluster on the natural key; or, pehaps more precisely, that you need to cluster on Loannumber, module, datamonth, loanservicer, Clientname, Loadeddate and may as well add Pool and Reviewmonth to the end of that list to make it into a full key. Doing that would mean that there's no use for an identity column except to optimise storage for foreign key constraints, and since you just have the one table there are no foreign keys to be optimised.

However, perhaps you ought to consider splitting the column into two tables as it can significantly reduce storage requirements and improve performance. One table could contain Loan Number, module, datamonth, loanservicer and clientname as primary key (in that order, clustered) and an identity column marked not null and unique so that it can be the target of a foreign key. The other table contains an integer column marked not null with a foreign key constraint pointing at the first table, Loadeddate, Pool, and Reviewmonth, and the the primary key is those four columns (in the order stated, clustered) plus all other columns that are in the current single table (no identity column in this table). The choice of which columns go in which table is determined by your example query so if you have other queries that would be better served by a different split it is possible that splitting would not be a good idea.

Your example query can then be rewritten in a straightforward way to use the first table to get an integer that identifies the sets of rows that your current query groups, and the maxiumum loadeddate can be found by looking at rows in the second table that contain that identifier as first column. All the columns used for grouping go in the first table, all other columns are the second table, and the identity column of teh first table provides the link between them that produces the original row. Your query would of course have to be modified to take account of this, and it would definitely be appropriate to measure performance and storage and compare with the single table approach before adoptig a split table approach.

Tom