Find Closest Match Data

  • Hi,

    I have a problem to find closest match data inputted with the one in Table.

    eg. In my Table, i have data like this.

    0010

    11242

    0011246

    0011264

    11284

    11345

    585

    5852

    then my script are

    declare @mcode varchar(10)

    set @mcode= '5857275'

    what i need that the script that return the number 585.

    if @mcode= '00112476'

    Then the script return the number 0011246

    How can i archived that ?

    Thx

  • jsnv (11/18/2008)


    Hi,

    I have a problem to find closest match data inputted with the one in Table.

    eg. In my Table, i have data like this.

    0010

    11242

    0011246

    0011264

    11284

    11345

    585

    5852

    then my script are

    declare @mcode varchar(10)

    set @mcode= '5857275'

    what i need that the script that return the number 585.

    if @mcode= '00112476'

    Then the script return the number 0011246

    How can i archived that ?

    Thx

    closest match from left, right or middle?? and how many characters???

    kshitij kumar
    kshitij@krayknot.com
    www.krayknot.com

  • thanks for response

    The closest match are from left, and the length the data for now is 9 Char, but in database set to 40 char.

    Thx

  • Where LEFT (col,4) like or equal to LEFT(@Var,4)

    You can substitute the value 4 by an appropriate value.

    "Keep Trying"

  • jsnv (11/18/2008)


    Hi,

    I have a problem to find closest match data inputted with the one in Table.

    eg. In my Table, i have data like this.

    0010

    11242

    0011246

    0011264

    11284

    11345

    585

    5852

    then my script are

    declare @mcode varchar(10)

    set @mcode= '5857275'

    what i need that the script that return the number 585.

    if @mcode= '00112476'

    Then the script return the number 0011246

    How can i archived that ?

    Thx

    You need to define your criteria more clear.

    As for me, 5857275 is closer to 5852 then to 585 - 4 matched digits against 3.

    You need to figure out the rules, build an algorithm and then start worry about SQL or other programming language implementation.

    _____________
    Code for TallyGenerator

  • Try this, you'll need a numbers/tally table

    http://www.sqlservercentral.com/articles/TSQL/62867/

    declare @mcode varchar(40)

    set @mcode= '5857275' -- 585

    set @mcode= '00112476' -- 0011246

    select top 1 t.mcode

    from mytable t

    inner join Numbers n on n.Number between 1 and len(t.mcode)

    and left(t.mcode,n.Number)= left(@mcode,n.Number)

    group by t.mcode

    order by max(n.Number) desc, len(t.mcode)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Token matching works quite well...

    [font="Courier New"]DROP TABLE #Table

    CREATE TABLE #Table (TheValues VARCHAR (20))

    INSERT INTO #Table (TheValues)

    SELECT '0010' UNION ALL        

    SELECT '11242' UNION ALL      

    SELECT '0011246' UNION ALL  

    SELECT '0011264' UNION ALL    

    SELECT '11284' UNION ALL                                

    SELECT '11345' UNION ALL                                

    SELECT '585' UNION ALL

    SELECT '5852'

    DECLARE @mcode VARCHAR(20)

    SET @mcode= '5857275'

    SET @mcode= '00112476'

    SELECT TOP 1 c.TheValues, (COUNT(*) * 100) / (LEN(@mcode)-2.00) AS MatchLevel

    FROM [Numbers] n

    INNER JOIN #Table c ON CHARINDEX(SUBSTRING(c.TheValues, n.number, 3), @mcode) > 0

    WHERE n.number <= LEN(@mcode) AND LEN(SUBSTRING(c.TheValues, n.number, 3)) = 3

    GROUP BY c.TheValues

    ORDER BY COUNT(*) DESC

    [/font]

    Results:

    TheValues TokenMatches

    ---------- ------------

    585 2

    0011246 4

    Cheers

    ChrisM

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Wow, the tally concept works perfectly Thanks Mark ๐Ÿ˜€

    And the token match also can be used Thanks Chris ๐Ÿ˜€

    Thanks you all for the solution, many many thanks :D:D:D

  • Chris, not following the phrase 'token matching' - looks like you're calculating a percent match based on the values in the table versus your @mcode, so I can see four matches in the values for the string, but why are we dividing by len(@mcode)-2.00?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Hi

    It works like this. For two equal strings, say @mcode= '0011246';

    Split c.TheValues into chunks (tokens), and test each one to see if it appears anywhere in @mcode. The tokens in '0011246' are as follows:

    001

    011

    112

    124

    246

    ...because they must be three characters long. So the maximum number of distinct tokens which can match between two equal strings (a single token could match more than once) is @mcode-2.

    HTH

    ChrisM

    [font="Courier New"]SELECT @mcode, c.TheValues, SUBSTRING(c.TheValues, n.number, 3)

    FROM [Numbers] n

    INNER JOIN #Table c ON CHARINDEX(SUBSTRING(c.TheValues, n.number, 3), @mcode) > 0

    WHERE n.number <= LEN(@mcode) AND LEN(SUBSTRING(c.TheValues, n.number, 3)) = 3

    AND c.TheValues = '0011246'

    ORDER BY c.TheValues, SUBSTRING(c.TheValues, n.number, 3)[/font]

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (11/20/2008)


    Hi

    It works like this. For two equal strings, say @mcode= '0011246';

    Split c.TheValues into chunks (tokens), and test each one to see if it appears anywhere in @mcode. The tokens in '0011246' are as follows:

    001

    011

    112

    124

    246

    ...because they must be three characters long. So the maximum number of distinct tokens which can match between two equal strings (a single token could match more than once) is @mcode-2.

    HTH

    ChrisM

    [font="Courier New"]SELECT @mcode, c.TheValues, SUBSTRING(c.TheValues, n.number, 3)

    FROM [Numbers] n

    INNER JOIN #Table c ON CHARINDEX(SUBSTRING(c.TheValues, n.number, 3), @mcode) > 0

    WHERE n.number <= LEN(@mcode) AND LEN(SUBSTRING(c.TheValues, n.number, 3)) = 3

    AND c.TheValues = '0011246'

    ORDER BY c.TheValues, SUBSTRING(c.TheValues, n.number, 3)[/font]

    Excellent explanation, crystal clear. Thanks!

    Jon

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Chris, thank you SO much for this -- years after your post, it is still helping people!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply