ad hoc query using "between two dates" acting weird

  • I'm running a query in the Query Analyzer, and it's behaving very strangely. The query is a select, with a several joins (a few inner joins and a few left outer joins), and several pieces in the where clause. All the join fields have 1-column indexes (one is usually the PK clustered index and the other is usually a nonclustered FK index), and all the expressions in the where clause have 1-column nonclustered indexes.

    One of the where clause expressions is "InvoiceDate between '8/1/2005' and '8/12/2005'" (12 days). When I run the query with this specific date range, the index on InvoiceDate is NOT used, rather the table's clustered PK index is used in a clustered index scan, and the query takes over 2.5 minutes to run. If I back down the end date to '8/11/2005' (11 days), the query uses the InvoiceDate index in an index seek, and it runs in about 2 seconds.

    Also, if I move the date range up a bit, and use '8/11/2005' to '8/22/2005' (12 days), I get the horrible performance (running right now...just passed the 5 minute mark), but if I change the end date to '8/21/2005' (11 days), the query runs great.

    I have run sp_updatestats, and I have rebuilt the InvoiceDate index, but this date range problem still exists.

    These date ranges each return less than 100 rows.

    Any suggestions?

    Thanks!

    Gary

  • Anyway you can make the InvoiceDate the clustered index?

  • If you query the table using the BETWEEN without the other tables, is the index used?

  • Great question, Steve! 🙂

    Looks like a simple select * from Table1 where InvoiceDate between '8/1/2005' and '8/12/2005' does NOT use the InvoiceDate index (it uses the PK clustered index, as the full query does). Using a date range of 8/1-8/11 DOES use the InvoiceDate index seek.

    Weird.

    Gary

  • how if you put the date range in your join and make your call to Table1 become a derived table?

    is it still using PK or InvoiceDate index?

    Thanks.

    Leo

  • How many rows are being returned by the query? Have you tried forcing the Index? It sounds as though SQL Server has decided that it is faster to use the clustered index than to continue optimizing the query plan. I have seen this happen and forcing th eindex has solved the problem (Select data from Table1 (index=InvoiceDateIndex) wher InvoiceDate between x and y).

  • I instinctively avoid using optimizer hints for indexes, as the optimizer generally gets things right (there are a few rare occasions when it doesn't, but on the whole I'd trust it).

    However, I agree with Jack that this sounds like a straightforward incorrect assumption by the optimizer regarding the efficiency of the NC index (assuming that your statement about the distribution of dates being roughly equal at 100 rows per date is accurate).  First instinct is to update the statistics, but you've already done that.  Maybe, then, it is the histogram sample size that is not giving an accurate enough representation of the data distribution, making the NC index appear more expensive than it actually is.

    I'd try updating the statistics on this particular table using the FULLSCAN option, so that there is no chance of the sampling currently used by sp_updatestats giving a distorted picture:

    update statistics <tablename>, <index_name> WITH FULLSCAN

    Hopefully this will allow the optimizer to now use the index without you having to modify the code (and leave a hardcoded index hint in there as a result)

  • I typically avoid optimizer hints as well, but in one particular application I support the optimzer regularly excludes a necessary index and by using the hint performance goes from minutes to seconds.  I also thoroughly test in a development system before deploying.

    I have not tried updating the stats with the FULLSCAN option though, so I would also try this first.

  • I had a similar problem.  For a "schedule" table I was querying the last two weeks of data.  I used two datetime variables one set to the current date and the other set to current date minus 2 weeks. 

    PROBLEM:   The query optimizer queried from the NONCLUSTERED primary key.  I think the query optimizer used the NONCLUSTERED PK because that column was used in subsequent joins.  Unfortunately the query execution was unexpected slow.  It did not matter whether I used Between or less than / greater than. 

    SOLUTION:  I modified the stored procedure to accept two datetime parameters and then called the stored procedure with the two dates.... much to my surprise the query optimizer decided to use the clustered index (the datetime column).

    Give it a try it ... try to avoid query hints.  Query hints can cause major diseconomies you up if the distribution of your data changes over time.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply