do you always use a "covering index" on a denormalized report table?

  • Do you always use a "covering index" on a denormalized report table?

    If you have a denormalized table that exists specifically for the purpose of storing rows of data for a specific report, for example:

    personid, firstname, lastname, street1, street2, city, state, zip

    If the report stored procedure returns the columns of data in this order then should you always create a covering index that includes the columns in the exact same order? Is there any time when you would have this use case where you would *not* want to create a covering index?

    How much of a performance increase would you normally get with a covering index in this use case depending on the number of rows in the table?

  • The answer is no. You would not always want a covering index.

    In your example, you are referring to a table that is used for a specific report that calls one stored procedure - presumably with a simple SELECT statement in it. In this case, the select statement would perform best (assuming it does not have a self-join in it) if you created a clustered index on the table with the sorting columns in the correct order. So, you would not have any additional indexes at all. However, you would want to consider the population of the table as well. If having a clustered index in this order would cause heavy fragmentation when populating the table, the index may need to be reconsidered.

    If the table has other columns in it (which I would assume would mean the table has uses other than the one report) a covering index specific for the report may make sense as long as it does not cause issue with populating the table.

    So, there are a lot of factors. However, if you have a single-purpose table that is serving just one SELECT statement, an appropriate clustered index usually makes the most sense - why have an additional index and store the data twice if you get no benefit?

  • I agree with everything Michael posted.

    The one point I'm not sure about... From your description, does this report even use a WHERE clause when accessing the information for display? Does it use a few of the same criteria every time or does the criteria vary?

    If it never uses a WHERE clause, you're going to always get a scan, so putting indexes on the table at all could be a waste of time & effort. If it always uses the same criteria, Michael's advice is best. If the criteria varies... then you get into the dance of deciding common access paths for the cluster and secondary access paths for x number of other indexes, possibly including covering indexes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I might take it one step further and, instead of a reporting table with a clustered index, I might use an indexed view. It would depend on how the table is used.

    If, instead, an ETL into a table with the right clustered index on it is the better solution, in terms of performance in the tables it's being loaded from, then I'd usually have a clustered index on it.

    Otherwise, it's based on the queries that will be run on it, just as it always is with indexing. No different from any other table that will be queried.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Grant Fritchey (12/5/2008)


    I agree with everything Michael posted.

    The one point I'm not sure about... From your description, does this report even use a WHERE clause when accessing the information for display? Does it use a few of the same criteria every time or does the criteria vary?

    If it never uses a WHERE clause, you're going to always get a scan, so putting indexes on the table at all could be a waste of time & effort. If it always uses the same criteria, Michael's advice is best. If the criteria varies... then you get into the dance of deciding common access paths for the cluster and secondary access paths for x number of other indexes, possibly including covering indexes.

    OK - Let's say, for example, that the where clause has conditions where personid = x or lastname like %y%. What type of indexing would you use?

  • sqlguy (12/5/2008)


    OK - Let's say, for example, that the where clause has conditions where personid = x or lastname like %y%. What type of indexing would you use?

    If you had the PersonId why would you need to OR for the Lastname? Also, the %y% is going to prevent the use of an index on LastName.

    However, Let's assume we can break down to two queries and then UNION ALL the results together.

    You would need one index with a leading edge on PersonId and one with a leading edge on LastName. Because of the OR statement, you couldn't use a compound. Which one is clustered, if any? I don't know. Are either of these the most common access path? Let's assume that it's, most of the time, PersonID. Then you make that the clustered index.

    Since Lastname won't use an index the question is answered, but let's assume we can modify the query so that it is WHERE 'y%' Now, is the LastName covering? Well, again, it depends. How often is it called. Is it called with SELECT *. How many rows out of the table does it return? Key lookup can be costly, but so can maintaining a VERY wide index. So, I'd need to test a bit to see which was more costly vs. the benefits I received. How many columns are returned by the query. All of them? Again, I'm less inclined to make it a covering index. A defined and fairly constant subset? Yes, I'd go for making that a covering index. Covering how, meaning all or some of the columns in the key or all or some (except LastName of course) in the INCLUDE statement? Again, are other search criteria applied on occasion in combination with the LastName? Then they should maybe be in the index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (12/5/2008)


    sqlguy (12/5/2008)


    OK - Let's say, for example, that the where clause has conditions where personid = x or lastname like %y%. What type of indexing would you use?

    If you had the PersonId why would you need to OR for the Lastname? Also, the %y% is going to prevent the use of an index on LastName.

    However, Let's assume we can break down to two queries and then UNION ALL the results together.

    You would need one index with a leading edge on PersonId and one with a leading edge on LastName. Because of the OR statement, you couldn't use a compound. Which one is clustered, if any? I don't know. Are either of these the most common access path? Let's assume that it's, most of the time, PersonID. Then you make that the clustered index.

    Since Lastname won't use an index the question is answered, but let's assume we can modify the query so that it is WHERE 'y%' Now, is the LastName covering? Well, again, it depends. How often is it called. Is it called with SELECT *. How many rows out of the table does it return? Key lookup can be costly, but so can maintaining a VERY wide index. So, I'd need to test a bit to see which was more costly vs. the benefits I received. How many columns are returned by the query. All of them? Again, I'm less inclined to make it a covering index. A defined and fairly constant subset? Yes, I'd go for making that a covering index. Covering how, meaning all or some of the columns in the key or all or some (except LastName of course) in the INCLUDE statement? Again, are other search criteria applied on occasion in combination with the LastName? Then they should maybe be in the index.

    OK - So are you saying that like '%y%' will negate the use of an index on last name but like 'y%' will still be able to use an index on lastname?

  • Yes...

    Another trick that can come in handy for the cases where you need to search like this : '%Y'

    You canadd a calculated column like this : ALTER TABLE dbo.tbl ADD COLUMN ColName AS REVERSE(SearchedColumn)

    CREATE NONCLUSTERED INDEX IX ON dbo.tbl (ColName)

    This willmaterialize the data and allow for another way to seek the data.

  • Oooh! Nice trick. I never knew about that one. Sweet.

    And yes, '%y%' won't use an index. If you think about it logically, it makes sense. An index stores the key values, say for Lastname, in order, alphabetical, either ascending or descending. If you search for 'y%' it can very quickly work through the pages and links to identify the page or pages that store all the names starting with 'Y'. But if you tell it to check for the occurence of 'Y' anywhere in the word... It just has to scan everything. Scanning everything is not something an index makes better. Although, a non-clustered index scan can be less costly than a table/clustered index scan because it will probably have less data in it. But it's still not a good thing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (12/6/2008)


    Oooh! Nice trick. I never knew about that one. Sweet.

    And yes, '%y%' won't use an index. If you think about it logically, it makes sense. An index stores the key values, say for Lastname, in order, alphabetical, either ascending or descending. If you search for 'y%' it can very quickly work through the pages and links to identify the page or pages that store all the names starting with 'Y'. But if you tell it to check for the occurence of 'Y' anywhere in the word... It just has to scan everything. Scanning everything is not something an index makes better. Although, a non-clustered index scan can be less costly than a table/clustered index scan because it will probably have less data in it. But it's still not a good thing.

    I knew about using an index to prevent a table scan. What are the most common causes of index scans and how can these be resolved?

  • I think it might be easier to answer a question like "Why do we exists or why am I here in this lifetime?".

    Without trying to discourage any new commers, performance tuning (which includes covering indexes) is much more an art than a science.

    You need to read on performance tuning and then go to work and see how it goes for you, in your environement in as many scenarios as you can think of. And then compare query A to query B and see what works best in that one case. Only experience will fully answer that question for you... and after 5 years, I'm still learning almost everyday!

    Sorry if I can't answer your questions the way you want me to. But the best way for you to start isto google this site for articles on the subject and get to work... nothing else will do!

  • sqlguy (12/6/2008)


    I knew about using an index to prevent a table scan. What are the most common causes of index scans and how can these be resolved?

    I'm not sure I can answer that question directly. An index scan occurs when the optimizer decides that it can use an index to satisfy a query. It scans the index when the index can't satisfy the query through a seek. There are lots of reasons that cause this to happen, but you can boil it down, sort of, to one of two; 1) the index isn't selective enough to use a seek, so you get a scan, 2) the query is asking for so much data that there's no other way to satisfy it than with a scan.

    There are so many variables in all this that I couldn't begin to list them all, if I even knew them. I think Ninja had a better answer than I'm providing here.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I can add to that list quite easilywhich will put you on the right tracks...

    once the query sees it can use the index, it will choose a seek if the where condition is sargable (searchable).

    To be sargagble, the condition must use the data in the index without any alteration :

    Where datecol between @date1 and @date2 is OK.

    but this will always cause a scan :

    Where DATEADD(D, -10, datecol) between @date1 and @date2 will always cause a scan. The index cannot seeked here because there must be a calculation made on the data to know if it satisfies the condition.

    To make this query sargable, you'd need to include the dateadd part straight in the parameters like so :

    set @date1 = dateadd(D, -10, @date1)

    set @date2 = dateadd(D, -10, @date2)

    then

    Where datecol between @date1 and @date2

    That would give you the same data, but with a sargable search. This is just one exemple of many we could give, but that gives you a good idea.

    The short list would be don't use any functions on the data searched (ok in parameters)

    Make sure the datatypes match so that there's no implicit conversion.

    No explicit conversion on the base index data either (unless it can be implicitly done, then it might not matter IE smalldatetime to datetime).

    Happy search.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply