Index Choices

  • Let me start by saying asking this question is making me feel rather foolish given the time I've spent working in the field, because it one that I probably know the answer to. The reason I'm asking it is more an issue of not trusting my instincts because this seems so deceptively simplistic. Some background might help, but I want to keep the actual question to come later fairly generic, and I'm operating under some restrictions.

    I'm being asked to optimise a database by a client, and it had absolutely no indices applied anywhere but on primary identity columns (where they existed). It is a design that itself isn't terribly bad at the structural level for tables, with a few quirks that after a few days of analysis I grasp as compromises (it was based on an ancient legacy system, and they were severely resource restricted during the first conversion to SQL Server 2000).

    My task was twofold:

    1) Isolate every necessary access call in a stored procedure to better protect the data quality, and to essentially create a single controlled point of access for multiple applications, wrapping some of the basic logic in there rather than the often conflicting front-end code enforcements. Why they have multiple applications is beyond any explanation I can ferret out, since they all seem to do the same things, but they insist on maintaining that separation despite rebuilding them all. Nonetheless, this bit is done, and I'm comfortable it provides some benefits.

    2) Optimise the model to perform better. I've already done some normalization, and such, and am comfortable with that. This is when I became aware that there are no indexes at all, though from a logical standpoint the entire model is really a series of simple parent-child relationships, and there were some obvious index choices I did make. I'm also restricted to staging these changes, because some of the oddball calling applications aren't going to play nice for some time, and need me to leave some structures intact that, if changed, would probably make my question irrelevant.

    The question then:

    I ran into three occurrence of tables that have multiple foreign keys. The basic example is a costing table that allows them to apply a simple cost to a project or an event, both of those being separate tables in their system. The costs are consolidated in that table purposefully, because reporting applications don't seem to care to differentiate the source of the cost beyond an attendant cost code. The basic table design is:

    ID identity column

    Amount money column

    Description varchar column

    Cost_Code varchar column

    FK_1 int

    FK_2 int

    etc.

    Where the remaining columns (they aren't really named FK_#, obviously) are references to the parent tables.

    While I can come up with an enormous list of good reasons to split this table and use a view to consolidate it for their reporting application, the nature of those legacy applications is that some of the tables (this one is one of two distinct cases) can't be split until another iteration. The front-ends are calling directly against the tables, with all parameters built on the client-end, to the point where I actually can't even change the field names where they are obtuse. (I know I could still get a view to handle this with some changes to the table name, but I'm hesitant to push on this point because they consider the reporting head to be mission critical and it was designed by the second in command of their enterprize, so there are political implications in pushing too far at once.)

    My conundrum then is one of those ultrabasic ones, which is both whether to index those foreign keys and if so whether to use a singular multi-column index or multiple ones. The hesitation is because other than writing to this table and aggregating to a reporting front-end, an analysis is showing they seldom really list or edit this data in the course of usual work based upon those foreign keys. It is a large table though, one of the largest in the system, and I'm certain it will be a performance drain unless I attack it somehow. (The other two similar form tables aren't nearly so used, and I expect I can safely ignore indices on them and get away with it in this iteration.)

    My hesitation comes from a concern I'll be adding potential overhead for little gain, and because they have used this monstrous table for years now and never found it an issue, I'm not sure I can explain the decision (the same second-in-command is riding this process and is touchy about any suggestion his design choices may be impractical moving forward). At the same time, my gut is telling me to do something, at least to try to make indices a consistent part of the system, so that moving forward in whatever the next iteration is the next chap doesn't have to suffer the choice.

    Any suggestions at all are appreciated. Particularly, can anyone who has experienced these large but simple multiple foreign-key tables give me a sense of whether indices on them are even effective, and in what context that may be of limited purpose.

    As I said when I opened this, the answer is likely pretty obvious, but I think the combination of the state of the original database, the dependency factors, and the political dance I've been having to do is taking a toll on my overtaxed brain.

    Thanks ahead of time for absolutely any suggestions, input, ideas, etc.

  • I'd use a trace to find the most common actions taken on that table, selects, updates, inserts and/or deletes, in the production database.

    That'll tell you straight away whether there are enough selects on it to make indexes make sense.

    Most likely, what you'll find is that most of the selects will be one of the FK columns, plus one or more data fields, with the FK column being either a Join criterion or part of a Where clause, and the other column(s) being selected.

    If that's so, indexes on those columns might be a good thing. (If it were 2005/2008 instead of 2000, you could set up the data columns as "includes" in the index; in 2000, you don't have that option.) What you'd want to do in that case is build an index for each FK column, with the other column(s) in each, but with the FK as the leading edge, so it's most efficient in selects. One index per FK column.

    Build the indexes in a test/dev copy of the database, and play back a bunch of queries on the table. Find out if it gains you anything. Make sure to speed and load test by playing back a trace that includes all normal actions on the table. You need to know if inserts are slowed down too much just as much as you need to know if selects are sped up.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the thoughts.

    I foolishly forgot to mention that part of the end-result of this upgrade is to move it to SQL Server 2005, and as such the "includes" option might be valid for me there. I'll look into that, as I had totally forgotten about that feature extension.

    The problem with tracing has been the number of divergent applications needing access to the model, and that they are in flux. I have had no control over upgrades on that side, and little access to the developers responsible for them. I can trace in production against the legacy applications, but all but two of them will go into production having been entirely rewritten, so that may leadme down a path that isn't much value. Obviously, I will be tracing against the stored procedures that surface for the new applications, so I have some potential there. It may be enough, come to think, since the only two legacy applications lagging for the change are almost exclusively reading.

    Again, thanks for the insight.

  • Even if applications are rewritten, the amount and type of activity in the table will probably not change much, since users will be doing the same things with the new applications as they were with the old. At least, that's usually true.

    Assuming that, a trace won't give you precise data, but it will give you a general outline of how active the table will be, and in what ways. Maybe not precise code, but you should be able to work that out yourself, because you'll have the procs.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I agree with GSquared. Capture the query info and make decisions from there. The one other thing that you might concern yourself with is the volatility of the data in the parent tables. Are there lots of deletes in those tables? If so, delete queries against them will be constantly looking into this table as part of the referential constraints. That might, depending on circumstances, drive the need for an index on those individual columns associated with the other tables. But it really depends on the circumstances you're operating under.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • For all intents they seem to seldom, if ever, delete anything from any of the interfaces. I will certainly bear that in mind, though.

    I suspect my block on this is to do with how isolated the process is. I'm supplying the stored procedures up the chain to developers, but having little or no feedback beyond "OK." It's worrisome from teh perspective there seems to be no defined scope to the interface changes, leaving the optimisation process hinging on assumptions.

  • Optimization based on assumptions and guesses is far too common.

    One dev I worked with thought that table variables were THE way to optimize stored procs. Took a proc that was running in under half a second, rewrote it to "optimize" it, and took up to 40 seconds, but didn't test it or anything, just did the whole thing on the production server. (Then came screaming to me about how messed up the database was, when users came in the next day and everything was timing out.)

    I've seen that kind of thing WAY too often.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • So true. I've been lucky the last seven years to have worked on a product/project where I had total control end-to-end, where talking with the developers was possible and a two-way education took place. On this one, so far, the rule seems to be "isolate all parties." The end resultis a lot of guessing, and unnecessary hesitation.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply