March 11, 2022 at 5:58 pm
I maintain a database of medical claims for about a dozen clients. The vast majority of my tables reference, via an integer foreign key, the primary key in a very meager CLIENT table. I don't think I need to be too elaborate in my table explanations in order for you to understand my architecture...the following should suffice so far as background preceding my ultimate question:
tbl_client = pk_client INT and about 4 other columns
tbl_claim = fk_client INT, and a few other columns
tbl_claim_line = fk_client INT, fk_claim INT, and a variety of other columns
tbl_etc
tbl_etc
tbl_etc
A very high number of tables in the database contain a reference to the client table, as depicted above.
Some tables, like tbl_claim_line, contain hundreds of millions of rows yet only about 12 different fk_client values.
99% of all queries filter on a specific fk_client value.
A typical query looks like this
SELECT
c.foo
, cl.bar
FROM
tbl_claim c
INNER JOIN
tbl_claim_line cl
ON c.pk_claim = cl.fk_claim
WHERE
c.fk_client = 15
AND cl.fk_client = 15
fk_client is, naturally, included in a good number of indexes, and performance in the system is not bad.
But, when I run any kind of index optimization code (Brent Ozar's stuff and similar) an index is recommended
on fk_client, all by itself, with a crap load of other columns via INCLUDE. This seems preposterous to me, but I get why it's recommended, sort of.
So...something I've been kicking around for years and thus my question to you good people (and you bad folks, too):
Wouldn't my database benefit by making all these tables containing fk_client
in to PARTITIONE TABLES - partitioned, obviously, on the fk_client field?
Thank you in advance for your response.
March 11, 2022 at 6:21 pm
You're on the right general lines, but you don't really need partitioning. Typically what you need for best overall performance is the best clustered index on all the child tables (whether it's the PK or not is irrelevant to performance, it's your preference what to make the PK).
Presumably tbl_client is already clustered by client_id / "PK_client".
tbl_claim should be uniquely clustered on ( client_id, claim_id ).
tbl_claim_line should be uniquely clustered on ( client_id, claim_id, claim_line_id ).
It's most helpful if you can provide the DDL for all tables.
If you want to do a deeper dive on these tables, I can provide a script we can use to analyze all index usage, including before trying to change the clus indexes, just in case your tables somehow don't follow the normal pattern for such parent/child tables.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 11, 2022 at 7:10 pm
Hi Scott, thanks for the reply.
I'm almost positive I had my clustered indexes designed as you suggest ages ago, but that was before I realized that, for whatever reason, it SEEMS right, but it's actually backwards. Unless I'm missing something here, a clustered index on multiple columns should be on a very selective/unique combination of columns. In your suggestion the index would begin with the LEAST unique column, followed by the next least unique, and so on. Inserting data would result in massive amounts of page splits, fragmentation, etc. would it not?
It seems that table partitioning was developed, in part anyway, to solve a problem just like I have here. Managing huge amounts of data differentiated only by a single VERY common value without resorting to the order in which it is stored/indexed.
I could provide DDL, but it's really just as simple as I indicated. In a perfect world each client's data would be in its own database - a few of our clients actually require this - but you can imagine what a maintenance headache that creates. Partitioned tables seem to be the logical choice as the client's data data really is independent of all other data in the shared able. Indexes WILL do he trick, but why wouldn't' partitioning be even better? Are there drawbacks I'm not aware of?
Respectfully,
Pat
March 11, 2022 at 7:38 pm
How much fragmentation depends on how INSERTs are done. Singular INSERTs will see more fragmentation than multi-row INSERTs. Either way, you can mitigate this with partitioning (based on the clustering I specified) and a lower FILLFACTOR, if you really need to.
> Unless I'm missing something here, a clustered index on multiple columns should be on a very selective/unique combination of columns. <<
In a general sense, and particularly for single-row lookups, yes. On top of which, many people have a built-in bias against multiple clustering keys (like Ozar, for example). But it's an improper and not fully considered bias.
Keep in mind that what's most critical for index lookups is that the first key(s) be the ones you provide values for. And that is the case here, because, as you yourself stated, "99% of all queries filter on a specific fk_client value." That's the reason you should cluster all the tables by that value. Btw, that is a very typical query pattern with parent/child table structures, that the parent key(s) is(are) (almost) always given.
On top of that, for 1-to-M joins, especially, the clus key is merely the entry point for all the sequential rows to follow. That is, the group of rows all with same ( client_id ) or ( client_id, claim_id ).
Re-clustering should result in the vast majority of your joins becoming merge joins, which are extremely efficient. You couldn't get even close to this efficiency with the (supposed) "default" clustering by an identity column.
In conclusion, you seem particularly interested in partitioning. I suspect that you'll end up clustering by client_id anyway to meet the needs of partitioning. It's common for partitioning to result in changing the clustering keys. That improves performance, and thus credit is given to partitioning for improving the performance. But it didn't really, the change in clustering keys caused the big performance gains.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 11, 2022 at 7:58 pm
Your statement, " I suspect that you'll end up clustering by client_id anyway to meet the needs of partitioning" is something I need to look in to. Are you saying that should I decide to partition the table by client_id that I will be required to adopt a clustered index on client_id, anyway? That would be very interesting - and a good reason to not complicate things by implementing partitioning over the clustered index you recommended.
Due to the nature of my database I can play with much of this without concern of user interference, so I will give the clustered index suggestion a try. And, for the record....I just check my claim line table to see what the clustered index is on...it is (fk_claim, pk_claim_line) - pretty much what you recommended sans fk_client. So I'm close already
Thanks
March 11, 2022 at 8:35 pm
Yep, you are close already.
And, in fact, your existing clus index on the claim line table is preferred IF:
(1) claim numbers are all unique (i.e. they never repeat across clients)
AND
(2) you DON'T specify client id in the WHERE/JOIN for the claim line table.
IF the claim number is unique, the client_id is specified there, so you don't need to re-check it for the claim line table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 11, 2022 at 8:48 pm
Hmm, now that I think about it, you can probably get away with specifying a WHERE/JOIN condition on the client_id in the claim_line table, since SQL will likely still do a clus index seek/scan on the claim_line table, just adding a predicate for client_id.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply