Optimize a LIKE '%text%' query

  • Hello!

    Is there a way of optimizing(/workaround) the performance of a LIKE '%text%' query, on a non-unique column (so, no full text index possible)?  Both wildcards are required.

    Thank you for any input!

  • Well the leading wildcard in that LIKE clause will guarantee it will have to do full scans.  Something LIKE 'text%' is fine though.  I guess it comes down to your requirements.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Unfortunately, both wildcards are required. I will update my initial post with this info, as well.

  • Alternatives are:
    select * from myTable x
    where x.Description like '%text%'
    select * from myTable x
    where PATINDEX('%text%',x.Description)>0
    select * from myTable x
    where CHARINDEX('text',x.Description)>0

    I doubt there is much difference between them though you would think that CHARINDEX has less logic in so it should be faster, but I don't think that's the case.

  • Thank you for your reply!
    Interestingly enough, if I run the second option (PATINDEX) I see a 3sec improvement, which is huge, because the initial run was 4-5sec and this is about 1-2sec.
    The problem is, if I put just this query (PATINDEX) ,  with absolutely no other statements, into a sp and send the %text% as a parameter to the sp, it goes back to 5sec

  • Capture and compare the actual execution plans between the various options.

  • ralu_k_17 - Friday, November 9, 2018 6:52 AM

    Hello!

    Is there a way of optimizing(/workaround) the performance of a LIKE '%text%' query, on a non-unique column (so, no full text index possible)?  Both wildcards are required.

    Thank you for any input!

    It depends... There is a method that works but it requires that you maintain an "N-grams" table of your values. 

    Check out the following two articles...
    http://www.sqlservercentral.com/articles/Tally+Table/142316/
    https://sqlperformance.com/2017/02/sql-indexes/seek-leading-wildcard-sql-server

  • Create a nonclustered index using the clustering column(s) as key(s) and INCLUDE the search column.  That's about the best you can do with this type of requirement.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I don't know your database, but it could be argued that if users must routinely perform LIKE '%keyword%' queries on text columns just to locate the rows they are looking for, then the data model is not appropriately normalized. For example, if users are querying WHERE CustomerName LIKE '%Smith%', then there should definitely instead have FirstName and LastName columns. Or, if they are querying WHERE ProductName LIKE '%laptop%' then that suggests a ProductCategoryID column and ProductCategory table should be introduced. So, the best solution depends on the context of why they are needing to perform this text searching.

    As someone suggested earlier, one approach is to parse the contents of the text name/description column into a keyword lookup table similar to the example below. Another benefit to implementing a search keyword lookup table is that the Keywords table can be extended to contain other attributes like a count of how times the keyword was searched and additional columns for categorization and semantic type searching. For example, all of the laptop, TV, phone, etc. related keywords could be assigned to a category called 'electronics'.


    CREATE TABLE Products (
    PRIMARY KEY CLUSTERED ( ProductID )
    , ProductID INT NOT NULL
    , ProductName VARCHAR(500) NOT NULL );

    CREATE TABLE ProductKeywords (
    PRIMARY KEY CLUSTERED ( KeywordID, ProductID )
    , KeywordID INT NOT NULL
    , ProductID INT NOT NULL );

    CREATE TABLE Keywords (
    PRIMARY KEY CLUSTERED ( Keyword )
    , KeywordName VARCHAR(200) NOT NULL
    , KeywordID INT NOT NULL
    , CategoryID INT NOT NULL
    );

    -- Query Products by keyword:
    SELECT P.ProductID, P.ProductName
    FROM Keywords AS K
    JOIN ProductKeywords AS PK ON PK.KeywordID = K.KeywordID
    JOIN Products AS P ON P.ProductID = PK.ProductID
    WHERE K.KeywordName = 'laptop';

    -- Query Products by category:
    SELECT P.ProductID, P.ProductName
    FROM Keywords AS K
    JOIN ProductKeywords AS PK ON PK.KeywordID = K.KeywordID
    JOIN Products AS P ON P.ProductID = PK.ProductID
    WHERE K.CategoryID = 12;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ralu_k_17 - Friday, November 9, 2018 6:52 AM

    Hello!

    Is there a way of optimizing(/workaround) the performance of a LIKE '%text%' query, on a non-unique column (so, no full text index possible)?  Both wildcards are required.

    Thank you for any input!

    1.  Does the table have a PK?
    2.  How wide is the text data and can you provide an example so we can see what you're actually having to work with?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ralu_k_17 - Friday, November 9, 2018 6:52 AM

    Hello!

    Is there a way of optimizing(/workaround) the performance of a LIKE '%text%' query, on a non-unique column (so, no full text index possible)?  Both wildcards are required.

    Thank you for any input!

    You may start from normalizing the data.

    _____________
    Code for TallyGenerator

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply