SELECT stmt with LIKE returns less estimated rows than with =.

Question

Post reply

SELECT stmt with LIKE returns less estimated rows than with =.

Art Sennholtz

Old Hand

Points: 355
More actions
May 10, 2006 at 7:40 am

#114141

Can someone out there explain why the execution plan for a statement containing a LIKE returns less estimated rows than the same statement containing an equal sign? I'm running SQL 2000 Enterprise. I've already run UPDATE STATISTICS on the index the plan is using, (IX_Person_1).
Here are the statements:
Statement without LIKE:
SELECT * FROM Person WHERE lastname = 'JONES' AND firstname = 'WALTER'
Execution Plan Generated:
OBJECT: ([Incidents].[dbo].[Person].[IX_Person_1]), SEEK: ([Person].[lastname]='JONES' AND [Person].[firstname]='WALTER') ORDERED FORWARD
Estimated Rows = 1503 - Actual Rows Returned = 8

Statement with LIKE:
SELECT * FROM Person WHERE lastname = 'JONES' AND firstname LIKE 'WALTER%'
Execution Plan Generated:
OBJECT: ([Incidents].[dbo].[Person].[IX_Person_1]), SEEK: ([Person].[lastname]='JONES' AND [Person].[firstname] >= 'WALTEQþ' AND [Person].[firstname] < 'WALTES'), WHERE: (like([Person].[firstname], 'WALTER%', NULL)) ORDERED FORWARD
Estimated Rows = 9 - Actual Rows Returned = 8
The IX_Person_1 index contains the following columns: lastname, firstname, middlename, generation. It is not the Primary Key nor is it Clustered.
Thanks in advance for any help you all can give me.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Is it possible that you only have 8 John Walter in the data, meaning that no more than 8 rows can be returned??

Art Sennholtz Old Hand Points: 355 More actions · Answer 2

Thanks for taking the time to look at my problem. The problem is with the difference in the Estimated Rows returned by the Execution Plan.

The Execution Plan calculates wildly different estimated rows returned for the two SELECT statements. The actual rows returned are 8 so why are the estimated rows 1503 for the = statement and 9 for the LIKE statement? You would think it would be the other way around.

Art

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

That's a very good question.. One that I never got around to ask so I have no idea what to answer... I'll let the real gurus take this one .

Green Armchair Hall of Fame Points: 3561 More actions · Answer 4

my guess is that the optimiser cannot get accurate distribution statistics for the firstname column when a "like" is used.

use "DBCC SHOW_STATISTICS ( table , target )" to look at the stats for the column

Art Sennholtz Old Hand Points: 355 More actions · Answer 5

Thanks! I ran the SHOW_STATISTICS and now I need to figure out what it's telling me.

The distribution statistics are much more accurate with the LIKE for some reason. Hmmmm.

Jeff Gray SSChampion Points: 10667 More actions · Answer 6

I have a guess. It would appear that the second query is planned to have a predicate (see the 'WHERE'?) on the seek, which makes it seem more selective to the optimizer. So, it may estimate approximately the same number of rows for the seek, and then reduce it due to the filter.

Or, it may be due to the fact that it is using three comparisons in the seek. (First name =, last name >=, last name < ).

It's just a guess, however.

jg

Jeff Gray SSChampion Points: 10667 More actions · Answer 7

I think some more and I think this....

Try getting the rowcount estimate from this select:

Select * from Person where

[Person].[lastname]='JONES'

AND [Person].[firstname] >= 'WALTEQþ'

AND [Person].[firstname] < 'WALTES'

That will tell you if the difference is related to the predicate or the triple comparison.

(I think)

jg

Art Sennholtz Old Hand Points: 355 More actions · Answer 8

Jeff, I tried your SELECT and got an estimated row count of 10 - which is what I would expect. I also ran the following statment and got an estimated row count of 9;

SELECT * from person where lastname > 'JONER' and lastname < 'JONESA' and firstname = 'WALTER'.

I must have something wrong because I can't believe that a select statement with a range selection would return less than an exact match.

The reason this is so important to me is that this select statement is just the 1st in a series of joins that recurse though several very large tables. When the initial estimated row count is too much all the subsequent joins go from SEEK's to TABLE SCAN's which makes the thing run dog slow. (At least my dog!)

Art

Jeff Gray SSChampion Points: 10667 More actions · Answer 9

I've run into that kind of problem before. Sometimes, the answer is to break up the query into smaller parts, storing intermediate results into a #Table or @Table variable.

There is one thing that I've had decent luck with in situations where a blown row count estimate switches things from seeks and loop joins into scans and hashes: Specifying a query hint at the query level using the "Fast X" option.

You could force loop joins where appropriate using "Inner loop join" instead of the plain old "Inner join", but specifying that type of hint forces the join order in the resulting plan. That is usually a pain to get right.

Using "Option (Fast X)" at the end of the query can nudge the optimizer back to a loop join without forcing the join order. Just supply the X with whatever you would expect the (final) row count to be.

hth jg

Art Sennholtz Old Hand Points: 355 More actions · Answer 10

Jeff,

The OPTION (FAST x) worked like a charm! Thank you soooo much. I appreciate everyone's comments on this.

Art