SELECT stmt with LIKE returns less estimated rows than with =.

  • Can someone out there explain why the execution plan for a statement containing a LIKE returns less estimated rows than the same statement containing an equal sign? I'm running SQL 2000 Enterprise. I've already run UPDATE STATISTICS on the index the plan is using, (IX_Person_1).

    Here are the statements:

    Statement without LIKE:

    SELECT * FROM Person WHERE lastname = 'JONES' AND firstname = 'WALTER'

    Execution Plan Generated:

    OBJECT: ([Incidents].[dbo].[Person].[IX_Person_1]), SEEK: ([Person].[lastname]='JONES' AND [Person].[firstname]='WALTER') ORDERED FORWARD

    Estimated Rows = 1503 - Actual Rows Returned = 8

     

    Statement with LIKE:

    SELECT * FROM Person WHERE lastname = 'JONES' AND firstname LIKE 'WALTER%'

    Execution Plan Generated:

    OBJECT: ([Incidents].[dbo].[Person].[IX_Person_1]), SEEK: ([Person].[lastname]='JONES' AND [Person].[firstname] >= 'WALTEQþ' AND [Person].[firstname] < 'WALTES'), WHERE: (like([Person].[firstname], 'WALTER%', NULL)) ORDERED FORWARD

    Estimated Rows = 9 - Actual Rows Returned = 8

    The IX_Person_1 index contains the following columns: lastname, firstname, middlename, generation. It is not the Primary Key nor is it Clustered.

    Thanks in advance for any help you all can give me.

  • Is it possible that you only have 8 John Walter in the data, meaning that no more than 8 rows can be returned??

  • Thanks for taking the time to look at my problem. The problem is with the difference in the Estimated Rows returned by the Execution Plan.

    The Execution Plan calculates wildly different estimated rows returned for the two SELECT statements. The actual rows returned are 8 so why are the estimated rows 1503 for the = statement and 9 for the LIKE statement? You would think it would be the other way around.

    Art

  • That's a very good question.. One that I never got around to ask so I have no idea what to answer... I'll let the real gurus take this one .

  • my guess is that the optimiser cannot get accurate distribution statistics for the firstname column when a "like" is used.

    use "DBCC SHOW_STATISTICS ( table , target )" to look at the stats for the column

  • Thanks! I ran the SHOW_STATISTICS and now I need to figure out what it's telling me.

    The distribution statistics are much more accurate with the LIKE for some reason. Hmmmm.

  • I have a guess.  It would appear that the second query is planned to have a predicate (see the 'WHERE'?) on the seek, which makes it seem more selective to the optimizer.  So, it may estimate approximately the same number of rows for the seek, and then reduce it due to the filter.

    Or, it may be due to the fact that it is using three comparisons in the seek.  (First name =, last name >=, last name < ).

     It's just a guess, however.

    jg

  • I think some more and I think this....

    Try getting the rowcount estimate from this select:

     

    Select * from Person where

    [Person].[lastname]='JONES'

    AND [Person].[firstname] >= 'WALTEQþ'

    AND [Person].[firstname] < 'WALTES'

     

    That will tell you if the difference is related to the predicate or the triple comparison. 

    (I think)

    jg

  • Jeff, I tried your SELECT and got an estimated row count of 10 - which is what I would expect. I also ran the following statment and got an estimated row count of 9;

    SELECT * from person where lastname > 'JONER' and lastname < 'JONESA' and firstname = 'WALTER'.

    I must have something wrong because I can't believe that a select statement with a range selection would return less than an exact match.

     

    The reason this is so important to me is that this select statement is just the 1st in a series of joins that recurse though several very large tables. When the initial estimated row count is too much all the subsequent joins go from SEEK's to TABLE SCAN's which makes the thing run dog slow. (At least my dog!)

    Art

  • I've run into that kind of problem before.  Sometimes, the answer is to break up the query into smaller parts, storing intermediate results into a #Table or @Table variable.

    There is one thing that I've had decent luck with in situations where a blown row count estimate switches things from seeks and loop joins into scans and hashes:  Specifying a query hint at the query level using the "Fast X" option.

    You could force loop joins where appropriate using "Inner loop join" instead of the plain old "Inner join", but specifying that type of hint forces the join order in the resulting plan.  That is usually a pain to get right.

    Using "Option (Fast X)" at the end of the query can nudge the optimizer back to a loop join without forcing the join order.  Just supply the X with whatever you would expect the (final) row count to be. 

    hth jg

     

     

  • Jeff,

    The OPTION (FAST x) worked like a charm! Thank you soooo much. I appreciate everyone's comments on this.

    Art

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply