optimizer "LIKE" vs "="

Question

Post reply

optimizer "LIKE" vs "="

joerg.kannengiesser

Old Hand

Points: 322
More actions
November 30, 2010 at 3:16 am

#231755

Hi,
we have a big issue with the response time from our DMS.
Spending a lot of time we could find out the following things:
in MSSQL 2005R1:
creating a high-selective index has the results,
that the optimizer uses this index when using a "=" predicate
but does not use ANY index (full table scan) then using a "LIKE" predicate.
After deleting the new index both selects are using another (not so good) index ...
in MSSQL 2008R2:
creating a high-selective index has the results,
that the optimizer uses this index when using a "=" predicate
but does not use this index then using a "LIKE" predicate -
instead he uses the old (not so good) index ...
Brrr ...
- any hints why there are differences between "= xxx" and "LIKE xxx%" ?
- any hints that this changed from 2005R1 to 2008R2 (nothing found about that) ?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Brigadur SSCrazy Points: 2097 More actions · Answer 1

Hi,

Interesting, so far one observation:

There don't seem to be any clustered index on the table. Is this table created only for experiment?

How is the "specialindex" created? Source would be nice.

Does the index include both columns in the where clause?

Cheers

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 2

Yes, the index covers both columns - so it SHOULD be used for selection:

CREATE NONCLUSTERED INDEX [pw_test] ON [dbo].[objkeys]

(

[okeyname] ASC,

[okeydata] ASC,

[parentid] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

The database is a production system, the DMS was installed by a consultant.

Well, it's not so easy to optimize, because:

there are 17.839.195 records in the table

there are 1.039.993 records with okeyname = 'MANDANT'

there are 646.593 records with okeydata = '31'

there are 4 (!) record with okeyname = 'MANDANT' AND okeydata = '31'

Brigadur SSCrazy Points: 2097 More actions · Answer 3

Hi,

How many rows are there in the table?

Could you post the execution plans also (not the pictures)?

Regards

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 4

sql-plans in attachment ...

How can I delete this post ... ?

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 5

sql-plans in attachment ...

How can I delete this post ... ?

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 6

sql-plans attached.

the story is going on:

after inactivating the pw_index in the 2005R1,

the SELECT with "=" uses the still remaining index (with "included columns"),

after deleting the pw_index in the 2005R1,

the SELECT with "=" does not use any index and makes a full table scan

whereas the LIKE uses the still remaining index (with "included columns") !?!

Seems that "=" likes indices wheres "LIKE" likes "included columns" ... this is more than strange !

Brigadur SSCrazy Points: 2097 More actions · Answer 7

You have a SELECT *

Do you really need all the columns returned? You have a RID lookup on the heap because there is no clustered index on the table. Good design practice would be to have a primary key also.

I would modify the index definition:

CREATE NONCLUSTERED INDEX [pw_test] ON [dbo].[objkeys]

(

[okeyname] ASC,

[okeydata] ASC

)

INCLUDE (

[parentid],

<any more columns to be returned>

)...

Example:

SELECT parentid, okeyno, okeyname, okeydata

FROM [objkeys]

WHERE [okeyname] like 'MANDANT' AND [okeydata] like '31'

would need: INCLUDE([parentid], [okeyno])

Regards

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 8

Of course it doesn't appear if there is a full-covered-index.

The problem was broken down, so the DMS-software joins this

table with other tables where some columns can't be included into the index

(they finally need about 10 columns of a joined table).

So this select is for showing the problem and of course only occurs

when selecting a column which is not included ...

Changing the Select to "SELECT parentid FROM [objkeys] ..." this happens in 2005R1:

WHERE [okeyname] = 'MANDANT' AND [okeydata] = '31'

=> index seek on pw_test (all columns in index)

WHERE [okeyname] like 'MANDANT' AND [okeydata] like '31'

=> index seek on ixobjdata (parantid and okeyname are "included columns)

Still strange, isn't it ?

Brigadur SSCrazy Points: 2097 More actions · Answer 9

Yes I agree with you, apparently the optimizer treats the "LIKE" and "=" variants differently, although in this case they are same and are interchangeable.

I was hoping that there is an index configuration where the optimizer detects this and produce same plan. My only guess here is that "LIKE" triggers the optimizer to prefer scan as a safe option as LIKE usually are used with wild-cards. But I am not convinced. Maybe someone else can give a better explanation.

Regards

Istvan

joerg.kannengiesser Old Hand Points: 322 More actions · Answer 10

It seems, that MS changed something in the optimization,

maybe it's a bug in 2005R1 - I will check this again after installation of SP3.

Does anyone know if there's a detailed description how the optimizer works / should work ?

I also expected, that the optimizer will optimize "on the fly":

whatever the original sql-was: why does it not recognize

that the intermediate result is only 4 records instead of the estimates 200.000

and then changes the plan from "full table scan" to "index seek" on foreign key ?

Brigadur SSCrazy Points: 2097 More actions · Answer 11

Hi,

I have found an interesting discussion about the same topic: LIKE vs Equals

I would also recommend the following blog: SQL in the Wild[/url]. There seems to be a lot about query plans.

Regards

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 12

joerg.kannengiesser (12/1/2010)
It seems, that MS changed something in the optimization,
maybe it's a bug in 2005R1 - I will check this again after installation of SP3.
Does anyone know if there's a detailed description how the optimizer works / should work ?
I also expected, that the optimizer will optimize "on the fly":
whatever the original sql-was: why does it not recognize
that the intermediate result is only 4 records instead of the estimates 200.000
and then changes the plan from "full table scan" to "index seek" on foreign key ?

Something like this is usually indicative of issues with the statistics. In your case, I'd say the stats are likely out of date. The estimated number of rows is 37k and the actual is 4. That's very much a stats issue. That's on the LIKE query and on the = query it's 14k to 4. Same thing. It looks like a stats issue.

As was already noted, you're running SELECT * and you don't have a cluster so you're getting a RID lookup operation.

LIKE and = are not the same, despite the lack of wildcards, the optimizer must treat them differently, because they are different operations. LIKE and LIKE% are usually sargeable, which means they are search argument able, meaning, they can reference indexes, like =, but the optimizer will still tend towards range scans within the seek operation in a LIKE statement where it's not going to use the scan technique for an =. To see the difference in application, go to the last two execution plans, 2008 equal & 2008 like. In the execution plan go to the index seek operator on the pw_test index. Right click the operator and select the properties menu choice. In the properties, scroll down to the Seek Predicates property. You can then see how the SEEK operation uses >= and < as the mechanism for searching values while the = operations simply uses =. It's details like this inside the execution plan that tell you what's happening in SQL Server.

For details on how the optimizer works, I'd suggest getting Kalen Delaney's SQL Server Internals book or Benjamin Nevarez's book Inside the SQL Server Query Optimizer[/url] (which isn't quite available, but here's a sample[/url]). Full Disclosure, I was the technical editor on Benjamin's book.

If you want to drill down on execution plans, I'd suggest my book (it's a free download) on execution plans.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Brigadur SSCrazy Points: 2097 More actions · Answer 13

Brigadur

SSCrazy

Points: 2097

December 1, 2010 at 11:13 am

#1257524

Grant, thank you very much!

Regards

Istvan Kathi

pspeagle Old Hand Points: 315 More actions · Answer 14

If a typical query specifies both of the indexed columns, then you might want to try creating another index with the opposite key order to see whether a query that uses = for the first key and like for the second can efficiently range scan the non-clustered index to find just the rows that meet both conditions before doing a bookmark lookup on the table to get the other columns that are returned by the select.

This would be more likely to help in scenarios where few rows are returned.