Separately indexing multiple columns for faster searching

  • I am a little confused on just how much an index can help here. I regularly need to run searches that go like "... WHERE Col1 = 'aaa' AND (Col2 = 'bbb' OR Col2 = 'ccc') AND Col3 = 'ddd' ..."

    The columns may be in any order and any combination used or unused and may include wildcards, since the queries are assembled on the fly in response to user interaction with a form. Since I cannot predict which columns will be in any given query, I cannot prepare an index over multiple columns to suit a predetermined query syntax.

    So my question is: given that each column may or may not be present, and each column may have any number of match conditions, and the columns may be in any order, will it be helpful to index each column that may participate in such a query? Or will such a chaotic query force SQL Server to do a table scan anyway and so indexing would be pointless?

  • http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hm, thank you, but I'm not sure that I follow this fully. There is discussion of how a properly constructed multi-column index is a better choice than separate column indexes, which is clear, but then, at the very end, there is the remark "As an aside, this is why the often-mentioned index ‘strategy’ of a single column index on each column of a table is near-useless and certainly not worth the title ‘strategy’."

    This is exactly what I am looking at, since I have no way of predicting which columns or conditions I will have in my search criteria, so I can't join multiple columns into an index in any useful way.

    At first, the article talks about how each search will use a one-column index, then join the results. I agree, this is certainly not optimal, but I don't see how I can do better in my situation, and index seeks with join sound better than table scans with join. But this last remark, that indexing individual columns in near-useless is throwing me off a bit.

    If an indexed column allows a better query plan than a non-indexed column, that hardly seems useless to me, although better foreknowledge of the search parameters would certainly allow an even better plan.

    Am I not understanding something in the article correctly?

  • The point is that SQL won't always chose index intersection, it's expensive, it'll only do it if there's a fairly large row count.

    Are the queries really that random? Are there really no two queries with the same columns in the where clause? That would be quite unusual.

    Too many people think indexing every column of a table is an indexing strategy. It's not. An indexing strategy requires analysing the queries, working out what indexes are required based on the queries actually running against the table - how frequent, how common, how critical, how expensive - and working from there.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • > The point is that SQL won't always chose index

    > intersection, it's expensive, it'll only do it if

    > there's a fairly large row count.

    I would call it medium-sized, around 80,000 records and growing in the main table, which has caused network response to become too slow for the MsAccess/Jet/file-server model originally used in this app, hence the conversion to SQL Server. This amount is certainly not large by SQL Server standards, but it's large enough that a poorly performing query produces a noticeable and irritating lag for the user.

    > Are the queries really that random? Are there really

    > no two queries with the same columns in the where

    > clause? That would be quite unusual.

    Not all, by any means. There are some combinations that I can predict and I have constructed separate indexes for those. And certainly, some combinations will be used more often than others. Some are clear now; others will probably surface as usage expands. But there -exists- the possibility of the user selecting any field or field combination for tracking down something obscure. This is an inventory database for a paleontological museum and the earliest items date back to the 1800s. Particularly when looking for older things, for which information is scarce and was often poorly entered, by interns with no real understanding of the material, it can be quite a detective chore to track down some material, and the user may try all sorts of odd properties in an attempt to find something.

    My intent is to cover as many bases as possible.

    > Too many people think indexing every column of a

    > table is an indexing strategy. It's not. An indexing

    > strategy requires analysing the queries, working out

    > what indexes are required based on the queries

    > actually running against the table - how frequent,

    > how common, how critical, how expensive - and

    > working from there.

    Okay, I think I see. You're saying that a carte blanche policy of individually indexing every field for no particular reason is not sensible. I can't help but agree, such shotgun policies are almost always a bad idea, in any activity or profession. My point here was whether, given the constraints of no foreknowledge, will it help query performance to separately index every field that may be used in a search? It would appear, then, that the answer is, "Yes, it will help. Not nearly as much as would an index properly constructed for a given case, but it will be better than no index at all."

    Am I on target now?

  • pdanes2 (12/3/2010)


    My point here was whether, given the constraints of no foreknowledge, will it help query performance to separately index every field that may be used in a search? It would appear, then, that the answer is, "Yes, it will help. Not nearly as much as would an index properly constructed for a given case, but it will be better than no index at all."

    No, the answer is 'maybe'

    Create targeted indexes for the combinations that you know are in use. Once you've done that, consider additional indexes for columns that may be used in the search but do not yet exist as a leading column in an index.

    Let's say you have 5 columns (ColA, ColB, ColC, ColD and ColE)

    You know (from testing) that generally ColA, ColB and ColD are filtered on together. You predict that usually ColD and ColE will be filtered on together. You know that ColC is usually filtered alone, but sometimes with ColB

    So from that, initial indexes would be

    ColA, ColB, ColD

    ColD, ColE

    ColC, ColB

    Now you have three indexes and just have two columns left that aren't the leading column of an index, so you can then add indexes on

    ColB

    ColE

    Five indexes, slightly wider than the one column indexes you're suggesting, but likely far more useful than 5 single column indexes.

    Oh, and just to make it clear, the order of columns in an index is important, the order of columns in the where clause is not.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I think you also need to consider the expense of maintaining all of those indexes. Every insert/delete will have multiple indexes to update. Updates can affect multiple also.

    I think a better strategy would be to identify an build covering indexes for the most frequently used queries, and take the hit on the others.

    Another consideration is whether you are looking for words within columns - i.e. does columnA contain "dinosaur" anywhere in it. If doing stuff like this, you might want to look into utilizing a full-text index.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • > Create targeted indexes for the combinations that

    > you know are in use. Once you've done that, consider

    > additional indexes for columns that may be used in

    > the search but do not yet exist as a leading column

    > in an index.

    Oh, yes, that makes sense. Once a column is the first in a combined index, there is no point in having another index with only that column, is there? I -might- even have thought of that myself, eventually.

    > Five indexes, slightly wider than the one column

    > indexes you're suggesting, but likely far more

    > useful than 5 single column indexes.

    Yes, that sounds like a much more sensible approach than my first thought. What I may do is put in the ones I know will be used and wait for complaints on the others. I am in daily contact with the end users, so I will quickly be aware of any problems. If none surface, so much the better, no unnecessary indexing. If a field is used for a lookup once or twice a year, it's probably not worth indexing. I may even put some tracking metrics into the app to see for certain what people are searching for, rather than relying on what they happen to remember using.

    > Oh, and just to make it clear, the order of columns

    > in an index is important, the order of columns in

    > the where clause is not.

    Thank you, tidbits like that are good to have completely clear.

  • > I think you also need to consider the expense of

    > maintaining all of those indexes. Every

    > insert/delete will have multiple indexes to update.

    > Updates can affect multiple also.

    Thank you, I have thought about that. The amount of update traffic is not significant – it is rare to have even two simultaneous users logged in and other activity on the machine is minimal. The time necessary to process any sort of action request is virtually zero, and even were there to be a short delay, the time the user needs to prepare himself for the next action is orders of magnitude greater. Most of the activity is by scientists and curators working directly from the material, measuring a rock, examining it under a microscope, comparing it with similar samples, consulting literature and so on. When they finally are finished entering all the information and dismiss the add form, it can take as much as several minutes for them to put away the sample, get out another one and begin entering new data.

    Much more important in regards to response time is when they are searching for something, especially when they don't know exactly for what, and need to try a number of different search parameters. Delays there would be quite annoying, hence my interest in best indexing strategies.

    > I think a better strategy would be to identify an

    > build covering indexes for the most frequently used

    > queries, and take the hit on the others.

    Gail mentioned that as well, and I think I will initially go with that, then see who complains.

    > Another consideration is whether you are looking for

    > words within columns - i.e. does columnA contain

    > "dinosaur" anywhere in it. If doing stuff like this,

    > you might want to look into utilizing a full-text

    > index.

    There will be some wildcard searches, but most of the fields are short – I don't think full-text would add that much utility for the hassle it would cause for me. There will sometimes be things like 'Devonian' or 'Devonian, upper' or 'Upper Devonian', where the user will be looking for '%Devonian%', but that can't be helped. I have designed the search form to minimize such attempts and if they insist, they will simply have to put up with an occasional delay. Although not always especially sophisticated in computer stuff, these are pretty intelligent people and a bit of explanation goes a long way with them. If the stuff they need regularly works the way they want, they can understand that computers have limitations and not everything can be completely instantaneous.

  • pdanes2 (12/3/2010)


    Although not always especially sophisticated in computer stuff, these are pretty intelligent people and a bit of explanation goes a long way with them. If the stuff they need regularly works the way they want, they can understand that computers have limitations and not everything can be completely instantaneous.

    Can I have your users please? Would be nice to have understanding ones for a change.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • 'Fraid not,:-) I've experienced enough horror shows to know full well just how good I have it here. They buy good equipment, ask for sensible things, listen to me and as long as stuff works as it should, my time is completely my own. Appreciate the sentiment, though.

  • 'Fraid not,:-) I've experienced enough horror shows to know full well just how good I have it here. They buy good equipment, ask for sensible things, listen to me and as long as stuff works as it should, my time is completely my own. Appreciate the sentiment, though.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply