Fuzzy Search

  • Depending on your situation the length of the search string vs. the length of the target string may be important so you should implement scoring for closer length matches.  Your thinking is probably on a good path: adapt depending on length of search string.

    The "magic" of my function is that 26 letters * 26 letters = 676 combinations with vastly different numbers of occurrences.  The letters QU may be common but any other combination of Q? is highly selective with perhaps only a few rows per million matching.  If you score on three letters in a row the selection would be highly selective but you reduce your chance of finding words where errors have been introduced so it is a tradeoff.

    I have seen demonstrations of information theory where searching for two consecutive words is very predictive of the third word (one of three likely answers).  It appears speech is not so hard to learn.  Patterns just beyond our ken may eventually reveal how simple our thinking really is.

    I would like to put in a good word for full text search.  In my experience it consumes less than 2% of the database size in disk and less than 2% of background processing to keep updated yet it provides extremely powerful searching capabilities.

  • Hi Bill, that Fuzzy Search function works very well. Thank you for that. Please show me an example of using that function comparing more than 2 characters, for example 3 or more. It would be nice if once could specify in the function (in a generic way) the number of characters to compare.

  • This adds a parameter to the function for the string length to compare.  I get the feeling though that ngrams of length 2 are probably the most "intelligent".  Please let me know what you find out.

    CREATE FUNCTION dbo.FuzzyMatchStringN (@s1 varchar(100), @s2 varchar(100), @n int)
    RETURNS int
    AS
    BEGIN
    -- written by William Talada for SqlServerCentral
    DECLARE @i int, @j int;
    SET @i = 1;
    SET @j = 0;
    WHILE @i <= LEN(@s1) - @n
    BEGIN
    IF CHARINDEX(SUBSTRING(@s1,@i,@n), @s2) > 0 SET @j=@j+1;
    SET @i=@i+1;
    END
    RETURN @j;
    END
    GO

    DECLARE @s1 varchar(100) = '15 JAMAICA AVE';
    SELECT DISTINCT top (50) Address1, fms.score, SOUNDEX(Address1) AS SoundexCode
    from Addresses
    CROSS APPLY (select dbo.FuzzyMatchStringN(@s1, Address1, 5) AS score) AS fms
    ORDER by fms.score desc
    ;
  • Hi all,

    I try to increase the performance by using CLR (C#).

    The query result from CLR is more fast.

    Is there another technics to increase the function performance?

    Thanks

    • This reply was modified 1 year, 4 months ago by  Roman.
    • This reply was modified 1 year, 4 months ago by  Roman.
    • This reply was modified 1 year, 4 months ago by  Roman.
    • This reply was modified 1 year, 4 months ago by  Roman.
    • This reply was modified 1 year, 4 months ago by  Roman.

Viewing 4 posts - 31 through 33 (of 33 total)

You must be logged in to reply to this topic. Login to reply