[Column] LIKE 'VALUE%' indexing?!

  • Ninja's_RGR'us (1/9/2012)


    Bumping is always frowned upon here (unless the next day or something like that).

    I didn't think this was done neither did Paul.

    Very sorry then. First time I ever bumped a thread here.

    SQL Kiwi (1/9/2012)


    aurato (1/9/2012)


    This post cleared up basically all of my remaining confusion. Thank you very much.

    You're welcome.

    aurato (1/9/2012)


    The LIKE query without the '-' works fine. Thanks again. I'm going to retry my procedure and omit the hyphen to see if that's what was causing the whole thing.

    I'm thinking the issue with the hyphen (or dash) character has to do with interpreting it in a culture-aware fashion. I'll post back if I get something more concrete on this. By the way, unless you really need Unicode with full language and culture aware comparisons, I would consider using VARCHAR instead of NVARCHAR - SQL collations typically compare and sort several times faster than Unicode collations. Faster still are the binary collations, but these are not suitable unless your LIKE comparisons can tolerate a strict byte-by-byte comparison (e.g. 'q' = 'Q'). Obviously, changing data type and/or collation would be a big job, given the size of the data, but if it is appropriate for you, it could be a big performance win. Another idea would be to hash words to integers, and seek on integer hash values.

    I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.

    Full-text indexing on Azure would be great though. It'd save me about 30 gigs of lookup tables to make website searches reasonable.

  • aurato (1/9/2012)


    I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.

    I should be clear about it: collation is (somewhat) independent of Unicode versus ANSI encoding. It is possible to use binary, SQL, and Windows collations with both, and the difference can be astonishing (20x or so).

    See http://connect.microsoft.com/SQLServer/feedback/details/512459/replace-function-extremely-slow-with-non-binary-windows-collation

  • Aaron Aardvark (1/9/2012)


    aurato (1/9/2012)


    I probably don't need Unicode. But I have a boss who insists upon it. One day it might be useful, who knows. The other suggestions are very interesting and I'll look into them if for nothing else educational purposes.

    I should be clear about it: collation is (somewhat) independent of Unicode versus ANSI encoding. It is possible to use binary, SQL, and Windows collations with both, and the difference can be astonishing (20x or so).

    See http://connect.microsoft.com/SQLServer/feedback/details/512459/replace-function-extremely-slow-with-non-binary-windows-collation

    I appear to have a SQL collation (SQL_Latin1_General_CP1_CI_AS) which I guess is the default.

  • This is the procedure I'm trying to use. I attempted this to join to a field that already had a '%' for the LIKE in it. It didn't improve performance very much.

    Note: DelimitedSplit8K is a copy-paste from Jeff Moden's article on it[/url].

    CREATE PROCEDURE [dbo].[DescriptionSearchLike]

    (

    @Description nvarchar(4000)

    )

    AS

    BEGIN

    DECLARE @Keywords TABLE

    (

    Id int not null,

    Keyword nvarchar(4000) not null

    )

    INSERT INTO @Keywords (Id, Keyword)

    SELECT Id, Keyword + '%' FROM DelimitedSplit8K(@Description, ' ')

    SELECT Manufacturer, PartNumber, Description

    FROM Items

    WHERE Id IN

    (

    SELECT TOP 1000 L.ItemId

    FROM @Keywords K

    INNER JOIN DescriptionKeywordLookup L

    ON L.Keyword LIKE K.Keyword

    GROUP BY L.ItemId

    ORDER BY COUNT(*) DESC

    )

    END

    GO

    EDIT: I feel the need to point out the Count(*) and grouping etc happens very quickly with an =

    EDIT2: DescriptionSearch, which is this procedure with the LIKE replaced by an =, finishes this query in about 2 seconds, as opposed to about 50 seconds in DescriptionSearchLike

    EDIT3: Replacing the table variable with a #temptable seemed to hurt performance by quite a lot.

    EDIT4: I forgot query plans again. I'll add them soon.

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply