January 21, 2011 at 10:12 am
the sqlist (1/21/2011)
Doug Bishop (1/21/2011)
Very good article. I do have a question. I am aware of SQL Server creating intersecting indexes when you have a query on multiple columns and each column has its own index. What I had not heard before is that SQL Server may not necessarily use that schema, but only use one and do a scan for remaining values.I am on SQL Server 2000. I am not much of an execution plan expert, but would like to know what to look for in the execution plan to determine if it is using a single index or multiple indexes.
We have a lot of queries where we join to multiple tables on multiple columns. Most of these tables have individual indexes on the join columns, and before I heard about intersecting indexes, I was under the impression that only one index would be used, thus compound indexes should be used. I'd like to prove or disprove this 'hypothesis' by looking at some execution plans for some of the queries.
Obviously the preferred way is to use the composite indexes. If you use the composite index you'll get the result in one trip using only one index. The problem is if the columns in the composite key may not be used all in all queries. My approach is this:
For the join always use the PKs(in general clustered), which also define the relations between tables. For any other filtering you need use non clustered composite indexes.
Let's say you have a table with an ID column as PK with multiple columns of which 3 , A, B and C, are frequently used in where clauses. Because it could be possible to use any combination of these 3 keys(could be more) and not necessarily all 3 of them each time what I do is creating 3 indexes like this:
A+B+C
B+C
C
This configuration covers pretty much any combination and the pattern could be used, if needed, with any number of columns.
Obviously, a join on PK is the best solution. Unfortunately these tables are all heaps, with no PK. The tables are built on a regular basis as part of an ETL process. Surrogate keys would have no meaning between tables and it would be basically impossible (at least unfeasable) to build PK-FK relationships.
Some of our tables are completely rebuild from a daily feed of MTD data and the overhead in trying to build relational models, and the time and resources to do it would result in the model being completed just about when the next feed comes.
A nautural key cannot even be defined, because it is possible to basically have duplicate data in our feed, and that is not only normal, but expected, as the front end would allow two of one item to come across as the same item twice, not the item once with a quantity of 2.
Thus, my question:
In the query
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.col3 = t2.col3
when I have three indexes, one each on col1, col2 and col3, how does the execution plan indicate that index intersection is being used?
January 21, 2011 at 10:15 am
Doug Bishop (1/21/2011)
Obviously, a join on PK is the best solution. Unfortunately these tables are all heaps, with no PK.
A table can be a heap and have a pk. It can have a clustered index without a pk. Primary key != clustered index.
When I have three indexes, one each on col1, col2 and col3, how does the execution plan indicate that index intersection is being used?
Please look at the blog post I linked to.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 21, 2011 at 10:52 am
GilaMonster (1/21/2011)
A table can be a heap and have a pk. It can have a clustered index without a pk. Primary key != clustered index.
Here is a point that is not well understood at all. Thanks for confirming this. As I see this the "primary key" is what is used most often to look up entries in said table. It is the target of the foreign key. In real world terms this is the unique assigned row identifier for the table. Not SQL's internal RID. It is used to get a unique row.
So one could have a table that has a primary key of type GUID and back that with a non-clustered index. The order of information within the pages or logically from page to page may not be important to me in this case.
It's the GUI tools (SSMS or Enterprise Manager) making the PK clustered is what has lead folks down the garden path.
ATBCharles Kincaid
January 21, 2011 at 11:02 am
No, the high selectivity fields (like primary key in a table) are in the joins and low slectivity field such as status is in the where clause.
I agree that in this situation the optimzer is wrong on put the low selectvity in the key field of index. You are right that we should not always trust optimizer's suggestion as sometimes during its index tuning, it suggests a new nonclustered index with the primary clustered key among other fields in the 'include', which I believe is totally redundant. Isn't it obvious that a nonclustered index already has the clustered key in the leaf page? Why optimzer would suggust it in the 'include'? Am I missing something?
After thinking more about this 'other-use-of-Include' opinion, let assume the join keys and the SARG in the where clause are both high selectivity. I agree with you that putting both as index keys would make the seek goes quicker. If I follow the opinion and put the SARG as the index key and put the join key in the Include, it would make the index smaller but the seek will take a little longer as more leaf records need to be searched but shouldn't be that bad. In fact, it does improve the query performance just like they are in the index fields. Would it be a trade-off between index size and seek time to determine the correct design of a nonclustered index? It probably depends on how big of the tables are and rows of records involved. Your thoughts...
January 21, 2011 at 11:05 am
Thus, my question:
In the query
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.col3 = t2.col3
when I have three indexes, one each on col1, col2 and col3, how does the execution plan indicate that index intersection is being used?
Think of it this way, SQL filters the t1 first then t2 and joins the matching columns. Having composite indexes on both tables as col1+col2+col3 will give you the best performance as it uses one single index on each table instead of 3 different indexes on each table. If you read that blog given by Gilla explains it very well.
When you have 3 indexes the SQL engine filters one table on the first index, then on the 2nd index and merge the results(intersects them), then filters the table on the third index and merges the result with the first merged result. It does that on the second table and in the end matches the rows.
Don't just give the hungry man a fish, teach him how to catch it as well.
the sqlist
January 21, 2011 at 11:08 am
Charles Kincaid (1/21/2011)
It's the GUI tools (SSMS or Enterprise Manager) making the PK clustered is what has lead folks down the garden path.
Not just the GUI tools, it's the default in SQL.
If I say ALTER TABLE ADD CONSTRAINT PRIMARY KEY I'll get a primary key enforced by a unique clustered index. However it's perfectly valid to say ALTER TABLE ADD CONSTRAINT PRIMARY KEY NONCLUSTERED and get a primary key enforced by a unique nonclustered index.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 21, 2011 at 11:13 am
OceanDeep (1/21/2011)
I agree that in this situation the optimzer is wrong on put the low selectvity in the key field of index.
Maybe. Maybe not. There's nothing wrong with a low selectivity column as an index key. It's probably not useful to have it there alone, unless the index is covering (which if missing indexes suggest it, it probably is)
it suggests a new nonclustered index with the primary clustered key among other fields in the 'include', which I believe is totally redundant. Isn't it obvious that a nonclustered index already has the clustered key in the leaf page?
The clustered index key is present in the leaf and maybe non-leaf levels of all nonclustered indexes, but there's nothing wrong with explicitly specifying it. SQL won't add it twice (it's smarter than that) and what happens if the clustered index is moved? If you don't have the clustered index key explicitly specified and it's needed in the index, then that index becomes less useful just because the clustered index has changed. Not desired behaviour.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 21, 2011 at 12:39 pm
I'm sorry but may be it is because I am 'old school'. I have been working with SQL Server for over 15 years. It used to be true that creating a clustered index on a key that would cause the table to increment always at the end ( i.e datetime, ID column) was bad if the table was very active (inserts). It would cause pages at the end of the table to be hot and contention would follow. When I did P & T gigs, we would pick a clustered index that would disperse this traffic and cause multiple entry points in the table. Yes this is more expensive but it would reduce locking issues.
I was also told by an 'expert' that since a non-clustered index contains the clustered key that there was no reason to repeat this value in the non-clustered index. For example:
Table a ( customer_no int, order_no int, order_date datetime, ....)
clustered index on (customer_no, order_no)
nonlcustered index (order_date)
In this case the non-clustered index contains the key from the clustered so it acts like an index with all three columns. Is this true?
If you do the following query:
select customer_no, order_no, order_date from a where order_date between '01/01/2009' and '12/31/2009'
The optimizer would use the non-clustered but only for scanning. Where if you had the customer_no, order_no, and order_date in the non-clustered then it would be a fully covered query. This becomes very importnat where you have fat tables and non-clustered indexes can be used to vertically splti tables. Again you need to balance this with the cost but if the DB is very report intensive it becomes a valuable key especially when aggregating data.
January 21, 2011 at 1:34 pm
Awesome article series Gail!
I know several development teams that could use to read these, if only I could make them.
They keep creating single column indexes for all their joins, without analyzing the patterns of the joins, where clauses, or select statements to cover their most common queries properly *sigh*.
Heh... at least the apps I work with I've taught my devs to read the execution plans, and see if they can either tweak an existing index or create a new better one to cover any new patterns that show up. 🙂
Can't say it enough about the awesomeness of this series.
January 21, 2011 at 1:36 pm
rik gretzinger (1/21/2011)
I'm sorry but may be it is because I am 'old school'. I have been working with SQL Server for over 15 years. It used to be true that creating a clustered index on a key that would cause the table to increment always at the end ( i.e datetime, ID column) was bad if the table was very active (inserts). It would cause pages at the end of the table to be hot and contention would follow. When I did P & T gigs, we would pick a clustered index that would disperse this traffic and cause multiple entry points in the table. Yes this is more expensive but it would reduce locking issues.
The last time that was true was in SQL 6.5. With the introduction of row-level locking in SQL 7, the hot-spot problem went away.
I was also told by an 'expert' that since a non-clustered index contains the clustered key that there was no reason to repeat this value in the non-clustered index.
Nonclustered indexes do contain the clustered index key. As a key column if the nonclustered is not unique, as an include column if it is unique.
This is NOT a good reason to leave the clustered key out of a nonclustered index if it is needed. SQL is not stupid enough to put the columns into the index twice if you explicitly add the clustering key and there's always the chance that another column will be added to the index (in which case the order of the implicitly included columns changes) and there's always the chance that the clustered index will change.
If you do the following query:
select customer_no, order_no, order_date from a where order_date between '01/01/2009' and '12/31/2009'
The optimizer would use the non-clustered but only for scanning.
No, it would seek on it. The index is covering (since the clustering key is implicitly part of that index) and the predicate is SARGable and on a left-based subset of the index key, hence that query can be satisfied by a simple index seek.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 25, 2011 at 12:19 pm
Great artical Gail! Your commentary about intersections and the factors around whether to place the most selective column in the leading edge of the index are very practical and helpful.
January 25, 2011 at 12:27 pm
Toby White (1/25/2011)
Great artical Gail! Your commentary about intersections and the factors around whether to place the most selective column in the leading edge of the index are very practical and helpful.
Thanks. There are posts that go into more detail on those and some other indexing topics on my blog: http://sqlinthewild.co.za/ (specifically in the indexing catagory
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 28, 2011 at 6:15 am
I've been doing SQL dev for a long time, but decided to approach this series like I was just starting out and knew jack about indexes. Yes, I learned a thing or two! OK, maybe a thing or 5. Thanks Gail - great series.
February 7, 2011 at 8:42 am
Gail - nice set or articles.
April 12, 2011 at 11:31 am
All very good articles, thank you.
Sometimes you *think* you know something but it turns out you were only half right. No harm in keeping the ole grey matter in check by re-learning stuff you *think* you knew. :unsure:
Knowledge is power...as a person once said. 😛
qh
Viewing 15 posts - 61 through 75 (of 92 total)
You must be logged in to reply to this topic. Login to reply