Indexing single or multi columns?

  • I'm a bit confused by indexing single vs multiple columns.

    Suppose I have two big tables A(a1, a2, a3) and B(b1, b2, b3), and the query will be:

    Select A.a3, B.b3

    from A inner join B on A.a1 = B.b1 and A.a2 = B.b2

    What if the querey is:

    Select A.a2, A.a3, B.b2, B.b3

    from A inner join B on A.a1 = B.b1

    Which one of the following will give me better performance?

    (1)create four index: id1 on A(a1), id2 on A(a2), id3 on B(b1), and id4 on B(b2);

    (2)create two index: id1 on A(a1, a2), and id2 on B(b1, b2);

    It may be an elementary question, but I would appreciate if someone could explain the difference for me.

    Larry

  • Would some one points me to the right direction or links?

  • Its not a simple question! I generally use single col indexes unless I need more than one col to enforce a constraint. Which index SQL uses varies depending on the statistics in use, the joins, etc. It can definitely vary. Tuning is tricky business, you have to experiment to get the best combinations - the Index Tuning Wizard is an example of a way to test those combo's using brute force, not the worst plan. If you've got a specific query giving you fits, post the query plan, we'll try to help. Hopefully others will have more (better!) pointers for you.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Thanks, Andy for the discussion.

    I tried the two index approach - id1 on A(a1, a2), and id2 on B(b1, b2) - on the query:

    Select A.a3, B.b3

    from A inner join B on A.a1 = B.b1 and A.a2 = B.b2

    The performance is not impressive (both tables have about 2.5 million records). Will try the 4 index approach.

    If index on single column would be better, why there is a need for an index on multi columns? Has anyone tried similar things? I ask this solely from a performance stand point (so excluding the index being a key or enforcing constraint).

  • I agree with Andy in that this does not have a single answer for every join criteria.

    That said - if this is the method you generally use to access table B, I would recommend a single index by b1, b2. SQL *should* use that index rather than a table scan since there aren't a number of other options to confuse it.

    Caveat (how do you spell that word anyway ): The where clause often changes the access method.

    I try to avoid single column indexes on large tables that are joined by multiple columns because the SQL engine likes to do separate queries on the criteria for each column and intersect the results.

    Definitely a "try a few methods" type of an issue. SQL has often surprised me my not choosing an index I thought would be the obvious choice.

    Guarddata-

  • I have just completed a study of why our systems seem to grind to a halt after being in service for a while, yet all indexes seemed to be applied correctly. I have also trawled the net, looking for answers, but most answers that I have found are quite vague, mostly suggesting the use of the index tuning wizard or just advising that it is a matter of fine-tuning the index / query for best results.

    I was always led to believe that the key to using an index was selectivity. i.e. The number of records that a Group By or WHERE = returns / Total Number of Records.

    For a Clustered Index (best on INT Fields) then selectivity could be between 10% - 20% and the index would still be used, but for a Non Clustered index this fell to a incredible 0.1%

    In general terms, this is still quite a good rule of thumb, but be aware that you must check all possible values, as it is possible that for one value the optimiser will use the index, but another value will cause the optimiser to revert to a table scan!

    The final decision of whether an optimiser will use an index or revert to a table scan is down to I/O (memory).

    As I have found out through experiment, you can have a good index, but if the number of rows X row width exceeds a threshold, then the optimiser will use a table scan as it will use less I/O, although it will take longer to perform the task.

    In essence, if you are going to join on multiple columns, then you should have a multiple column index, with the columns in the index being in most selective to least selective order.

    The Join should also follow this format.

    Finally, attempt to reduce the I/O required by reducing the number of rows being joined on. Instead of joining two physical tables, could these be replaced by virtual tables.

  • quote:


    Select A.a3, B.b3

    from A inner join B on A.a1 = B.b1 and A.a2 = B.b2

    --

    The performance is not impressive (both tables have about 2.5 million records). Will try the 4 index approach.


    First, approximately how many of each of the 2.5 million rows in each table match? Second, if all you need are a1, a2, a3 columns, out of a number of other fields in both tables, then creating an index on all three columns (a1,a2,a3) on table A and TableB (b1,b2,b3) would mean that covering indexes would be used for the entire query, and the data pages would not have to be touched at all, only the index leaf pages.

    Edited by - jpipes on 08/28/2003 1:17:37 PM

  • quote:


    Would some one points me to the right direction or links?


    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_8_des_05_2ri0.asp might be a good starting point

    Cheers,

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • What ronc wrote about the optimizer explains my results in certain complicated queries where big tables (several millions of records in each) are joined: SELECT TOP 500 uses index and runs around 5 seconds, while the same query with SELECT TOP 600 does table scan and I lost my patience before I got any result (BTW, there are many thousands of matching rows).

    In the given question about tables A and B, I would first try multiple column index, and make sure that the index is used. If it isn't, enforce it in the query /e.g. JOIN b WITH(INDEX(index_name)) ON ..../ and test, whether it helps. Make sure that columns in index are used in the right order, as ronc mentioned - the one that eliminates most records first, then the other. If this doesn't seem to work, try single column indexes or some completely different approach to the problem.

    In general, I have quite good results with multiple column indexes on huge tables, but there is no final solution for everything ... you have to test what works best for you. Good luck!

  • IMO the best way to figure out indexes is to test, test, test. Also, you need to weigh your indexes with not only your queries but your inserts as well, for example, in 7.0 and above SQL tables with no clustered indexes are intelligent heaps, intelligent in that they reclaim empty space so after any deletes those next inserts have to intelligently find empty space. That being said, very rare is it you should not have a clustered index on a table.

    To truly test your index efficiencies, play with them. In QA run "set statistics IO on" then run your query and look at messages. This will give you the logical IO, which is the number of times SQL had to access a page (not number of pages, you might have to access a single page multiple times), the less logical IO you have, the better your indexes are set. I would use the same problem query and do as many what-if scenarios as you can.

    If you have questions on placement of clustered index, please ask.

    HTH

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • A couple of rules about composite indexes (multiple column indexes).

    1) Always put your most unique to least unique in the index as statistics are only stoed for the first column.

    2) Avoid using composite indexes for clustered indexes unless there will be no other indexes. The reason is non-clustered indexes also store the related clustered index value which means the wider the clustered index the more impact it has on the size of the non-clustered indexes.

    3) If you many times use the fields individually inqueries then use seperate indexes. The reason is say you have an index for table A of a1, a2, a3, if you query on a1 then it will do a seek in many cases. But since a2 is not the column controlling the data sorting in the index you will always get an index scan which is less efficient. But keep in mind seprate indexes mean seperate stats will be stored for each and the best stats always win when it comes to index choices.

  • To add to Ant's post, the other reason to avoid composite clustered indexes is you will get unpredictable page splitting when doing inserts, updates or deletes on any column included in the composite clustered index.

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply