Wrong Index

  • I hope a SQL guru out there has a solution for this (although it does happen in DOCS Open, I am pretty sure it is a SQL issue since all my testing outside of DOCS produces the same results):

    On two of my servers (SQL 7 SP2), running the following script (A programmed query in a third party app):

    SELECT DISTINCT A.MATTER AS 'MATTER.MATTER', B.CLIENT_ID AS 'MATTER.CLIENT.CLIENT_ID'

    FROM DOCSADM.PROFILE A, DOCSADM.MATTER B, DOCSADM.CLIENT C

    WHERE ((B.SYSTEM_ID = A.MATTER OR A.MATTER = 0)

    AND (C.SYSTEM_ID = B.CLIENT_ID OR B.CLIENT_ID IS NULL))

    AND A.DOCNUMBER = 5481726

    takes over a minute to run, on my other servers it takes about 3 seconds (databases are identical). The difference is that on the bad servers the query uses the MATTERF1 (CLIENT_ID) index as opposed to the MATTERP (SYSTEM_ID) index. If I add the Table Hint "As B With (Index(MATTERP))" for the Matter table, all is well and the query runs in its normal time. What is really weird is that until Thursday night all was fine, something happened or did not happen on those servers that is causing the problem.

    Any idea why a server would not use the proper index? (all servers are identical in configuration except for Boost SQL Server priority on Windows NT, it is on on two of the good servers and off on the bad ones - could this be it?)

    Thanks

  • I would guess you have to update the statistics

  • quote:


    I would guess you have to update the statistics


    I've dropped and rebuilt indexes and dropped and recreated statistics but I still have the problem. If I drop all indexes except the one that is supposed to be used then the query runs fine; it's as if it knows which to use but deliberately decides not to use it.

  • quote:


    it's as if it knows which to use but deliberately decides not to use it.


    you can't tell for sure which stategy the optimizer chooses and why.

    I, too, would stay away from using index hints, because what might seem to be appropriate now might be completely wrong when data changes significantly.

    Are you sure the underlying data is the same on both servers?

    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I had a very similar problem with a stored procedure. The server with the problem was subject to a sudden power outage. After this it would not use the most efficient execution plan for the sp (even after updating stats etc). The only solution was to drop and recreate the sp.

    Paul

  • Still don't know what the problem actually is, but somehow it is disappearing from selected databases. There is no rhyme or reason for what database gets fixed, some I have I re-indexed and recreated the stats, others I haven't.

  • quote:


    Still don't know what the problem actually is, but somehow it is disappearing from selected databases. There is no rhyme or reason for what database gets fixed, some I have I re-indexed and recreated the stats, others I haven't.


    My best guess is that the statistics are auto updating.

    If this was a stored procedure rather than just a script, another culprit could have been "parameter sniffing," where a plan is compiled based on the statistics for the initial parameter(s) used and subsequent calls to the SP use that plan, which may not be optimal given the selectivity of the subsequent values entered.

    --Jonathan



    --Jonathan

  • Any way to clear this "parameter sniffing"?

  • I think you've already cleared this 'parameter sniffing' by updating the statistics.

    FWIW, here's a part from Inside SQL Server 7 that still is true

    quote:


    Index Statistics

    After the query optimizer finds a potentially useful index that matches the clause, it evaluates the index based on the selectivity of the clause. The optimizer checks the index's statistics—the histogram of values accessible through the index's row in the sysindexes table. The histogram is created when the index is created on existing data, and the values are refreshed each time UPDATE STATISTICS runs. If the index is created before data exists in the table, no statistics appear. The statistics will be misleading if they were generated when the dispersion of data values was significantly different from what appears in the current data in the table. However, SQL Server detects if statistics are not up-to-date, and by default it automatically updates them during query optimization.

    Statistics are a histogram consisting of an even sampling of values for the index key (or the first column of the key for a composite index) based on the current data. The histogram is stored in the statblob field of the sysindexes table, which is of type image. (As you know, image data is actually stored in structures separate from the data row itself. The data row merely contains a pointer to the image data. For simplicity's sake, we'll talk about the index statistics as being stored in the image field called statblob.) To fully estimate the usefulness of an index, the optimizer also needs to know the number of pages in the table or index; this information is stored in the dpages column of sysindexes.

    The statistics information also includes details about the uniqueness of the data values encountered, referred to as the density, which provides a measure of how selective the index is. Recall that the more selective an index is, the more useful it is, because higher selectivity means that more rows can be eliminated from consideration. A unique index, of course, is the most selective—by definition, each index entry can point to only one row. A unique index has a density value of 1/number of rows in the table.

    Density values range from 0 through 1. Highly selective indexes have density values of 0.10 or lower. For example, a unique index on a table with 8345 rows has a density of 0.00012 (1/8345). If there is a nonunique nonclustered index with a density of 0.2165 on the same table, each index key can be expected to point to about 1807 rows (0.2165 × 8345). This is probably not selective enough to be more efficient than just scanning the table, so this index is probably not useful. Because driving the query from a nonclustered index means that the pages must be retrieved in index order, an estimated 1807 data page accesses (or logical reads) are needed if there is no clustered index on the table and the leaf level of the index contains the actual RID of the desired data row. The only time a data page doesn't need to be reaccessed is for the occasional coincidence that can occur when two adjacent index entries happen to point to the same data page.

    Assume in this example that about 40 rows fit on a page, so there are about 209 total pages. The chance of two adjacent index entries pointing to the same page is only about 0.5 percent (1/209). The number of logical reads for just the data pages, not even counting the index I/O, is likely to be close to 1807. If there is also a clustered index on the table, a couple of additional page accesses will be needed for each nonclustered key accessed. In contrast, the entire table can be scanned with just 209 logical reads—the number of pages in the table. In this case, the optimizer looks for better indexes or decides that a table scan is the best it can do.

    The statistics histogram records steps (samples) for only the lead column of the index. This optimization takes into account the fact that an index is useful only if its lead column is specified in a WHERE clause. The density information stored in the statblob field consists of two types of information, called density values and all_density values. The values are computed based on the number of rows in the table, the cardinality of the indexed columns (the number of distinct values), the number of nonfrequent (NF) values (values that appear no more than once in the histogram step boundaries), and the cardinality of NF values. This will become clearer when we see some actual statistics information.

    SQL Server defines density and all_density as follows:

    density = (NF count/distinct NF count)/(number of rows)

    all_density = 1/cardinality of index keys

    Statistics for a single column consist of one histogram, one all_density value, and one density value. The multicolumn statistics for one set of columns in a composite index consist of one histogram for the first column in the index, one density value for the first column, and all_density values for each prefix combination of columns (including the first column alone).The fact that density information is kept for all columns helps you decide how useful the index is for joins


    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    Any way to clear this "parameter sniffing"?


    Yes, recompile the stored procedure. It will then create a plan based on the parameters you next use.

    --Jonathan



    --Jonathan

  • quote:


    I think you've already cleared this 'parameter sniffing' by updating the statistics.


    I don't believe that manually updating statistics will force recompiles, although the AutoStat process can:

    http://support.microsoft.com/?id=195565

    quote:


    FWIW, here's a part from Inside SQL Server 7 that still is true


    I like this better because of the examples:

    http://msdn.microsoft.com/library/techart/statquery.htm

    I have consulted at sites where clients have deleted the "spurious" WA_* indexes because "they're taking up room." I explained to them that each uses only one row in sysindexes (no actual "index" exists), and then came up with this:

    Think of indexes as roads, statistics as a roadmap, and you've got a

    compass. You want to reach a point five kilometers north of where you

    are now. You see several roads (if you have indexes), and, if you have

    no map, guess which one to use based on its general direction. With a

    roadmap (statistics), it's easy to determine which road(s) would get you

    to your destination most efficiently, or even if you'd be best off

    hiking cross-country (not using an index).

    --Jonathan



    --Jonathan

  • Nice references!

    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply