optimizer "LIKE" vs "="

  • Hi,

    we have a big issue with the response time from our DMS.

    Spending a lot of time we could find out the following things:

    in MSSQL 2005R1:

    creating a high-selective index has the results,

    that the optimizer uses this index when using a "=" predicate

    but does not use ANY index (full table scan) then using a "LIKE" predicate.

    After deleting the new index both selects are using another (not so good) index ...

    in MSSQL 2008R2:

    creating a high-selective index has the results,

    that the optimizer uses this index when using a "=" predicate

    but does not use this index then using a "LIKE" predicate -

    instead he uses the old (not so good) index ...

    Brrr ...

    - any hints why there are differences between "= xxx" and "LIKE xxx%" ?

    - any hints that this changed from 2005R1 to 2008R2 (nothing found about that) ?

  • Hi,

    Interesting, so far one observation:

    There don't seem to be any clustered index on the table. Is this table created only for experiment?

    How is the "specialindex" created? Source would be nice.

    Does the index include both columns in the where clause?

    Cheers

  • Yes, the index covers both columns - so it SHOULD be used for selection:

    CREATE NONCLUSTERED INDEX [pw_test] ON [dbo].[objkeys]

    (

    [okeyname] ASC,

    [okeydata] ASC,

    [parentid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    The database is a production system, the DMS was installed by a consultant.

    Well, it's not so easy to optimize, because:

    there are 17.839.195 records in the table

    there are 1.039.993 records with okeyname = 'MANDANT'

    there are 646.593 records with okeydata = '31'

    there are 4 (!) record with okeyname = 'MANDANT' AND okeydata = '31'

  • Hi,

    How many rows are there in the table?

    Could you post the execution plans also (not the pictures)?

    Regards

  • sql-plans in attachment ...

    How can I delete this post ... ?

  • sql-plans in attachment ...

    How can I delete this post ... ?

  • sql-plans attached.

    the story is going on:

    after inactivating the pw_index in the 2005R1,

    the SELECT with "=" uses the still remaining index (with "included columns"),

    after deleting the pw_index in the 2005R1,

    the SELECT with "=" does not use any index and makes a full table scan

    whereas the LIKE uses the still remaining index (with "included columns") !?!

    Seems that "=" likes indices wheres "LIKE" likes "included columns" ... this is more than strange !

  • You have a SELECT *

    Do you really need all the columns returned? You have a RID lookup on the heap because there is no clustered index on the table. Good design practice would be to have a primary key also.

    I would modify the index definition:

    CREATE NONCLUSTERED INDEX [pw_test] ON [dbo].[objkeys]

    (

    [okeyname] ASC,

    [okeydata] ASC

    )

    INCLUDE (

    [parentid],

    <any more columns to be returned>

    )...

    Example:

    SELECT parentid, okeyno, okeyname, okeydata

    FROM [objkeys]

    WHERE [okeyname] like 'MANDANT' AND [okeydata] like '31'

    would need: INCLUDE([parentid], [okeyno])

    Regards

  • Of course it doesn't appear if there is a full-covered-index.

    The problem was broken down, so the DMS-software joins this

    table with other tables where some columns can't be included into the index

    (they finally need about 10 columns of a joined table).

    So this select is for showing the problem and of course only occurs

    when selecting a column which is not included ...

    Changing the Select to "SELECT parentid FROM [objkeys] ..." this happens in 2005R1:

    WHERE [okeyname] = 'MANDANT' AND [okeydata] = '31'

    => index seek on pw_test (all columns in index)

    WHERE [okeyname] like 'MANDANT' AND [okeydata] like '31'

    => index seek on ixobjdata (parantid and okeyname are "included columns)

    Still strange, isn't it ?

  • Yes I agree with you, apparently the optimizer treats the "LIKE" and "=" variants differently, although in this case they are same and are interchangeable.

    I was hoping that there is an index configuration where the optimizer detects this and produce same plan. My only guess here is that "LIKE" triggers the optimizer to prefer scan as a safe option as LIKE usually are used with wild-cards. But I am not convinced. Maybe someone else can give a better explanation.

    Regards

    Istvan

  • It seems, that MS changed something in the optimization,

    maybe it's a bug in 2005R1 - I will check this again after installation of SP3.

    Does anyone know if there's a detailed description how the optimizer works / should work ?

    I also expected, that the optimizer will optimize "on the fly":

    whatever the original sql-was: why does it not recognize

    that the intermediate result is only 4 records instead of the estimates 200.000

    and then changes the plan from "full table scan" to "index seek" on foreign key ?

  • Hi,

    I have found an interesting discussion about the same topic: LIKE vs Equals

    I would also recommend the following blog: SQL in the Wild[/url]. There seems to be a lot about query plans.

    Regards

  • joerg.kannengiesser (12/1/2010)


    It seems, that MS changed something in the optimization,

    maybe it's a bug in 2005R1 - I will check this again after installation of SP3.

    Does anyone know if there's a detailed description how the optimizer works / should work ?

    I also expected, that the optimizer will optimize "on the fly":

    whatever the original sql-was: why does it not recognize

    that the intermediate result is only 4 records instead of the estimates 200.000

    and then changes the plan from "full table scan" to "index seek" on foreign key ?

    Something like this is usually indicative of issues with the statistics. In your case, I'd say the stats are likely out of date. The estimated number of rows is 37k and the actual is 4. That's very much a stats issue. That's on the LIKE query and on the = query it's 14k to 4. Same thing. It looks like a stats issue.

    As was already noted, you're running SELECT * and you don't have a cluster so you're getting a RID lookup operation.

    LIKE and = are not the same, despite the lack of wildcards, the optimizer must treat them differently, because they are different operations. LIKE and LIKE% are usually sargeable, which means they are search argument able, meaning, they can reference indexes, like =, but the optimizer will still tend towards range scans within the seek operation in a LIKE statement where it's not going to use the scan technique for an =. To see the difference in application, go to the last two execution plans, 2008 equal & 2008 like. In the execution plan go to the index seek operator on the pw_test index. Right click the operator and select the properties menu choice. In the properties, scroll down to the Seek Predicates property. You can then see how the SEEK operation uses >= and < as the mechanism for searching values while the = operations simply uses =. It's details like this inside the execution plan that tell you what's happening in SQL Server.

    For details on how the optimizer works, I'd suggest getting Kalen Delaney's SQL Server Internals book or Benjamin Nevarez's book Inside the SQL Server Query Optimizer[/url] (which isn't quite available, but here's a sample[/url]). Full Disclosure, I was the technical editor on Benjamin's book.

    If you want to drill down on execution plans, I'd suggest my book (it's a free download) on execution plans.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant, thank you very much!

    Regards

    Istvan Kathi

  • If a typical query specifies both of the indexed columns, then you might want to try creating another index with the opposite key order to see whether a query that uses = for the first key and like for the second can efficiently range scan the non-clustered index to find just the rows that meet both conditions before doing a bookmark lookup on the table to get the other columns that are returned by the select.

    This would be more likely to help in scenarios where few rows are returned.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply