SQL with Search Score

  • I have several fields for match, is possible to define a user function for compare and return Match score in a SQL statment ?

  • Not exactly sure what you mean by Score., But have you looked at functions Soundex, and Difference?

    They return a value based off comparison, but not a score per say.

    I have not worked with them much though.

     

  • For example, user input address, then list out the simular address with sorting by matching score.

     

  • Check these examples out:

    http://support.microsoft.com/kb/186133/en-us

    Andy

  • But there are not mentioned anything to generate Matching score,

    in mysql, there has Match() function, in MSSQL it seems only able to get the search ranking by Full text search with Index Server.

  • Try a position compare of the two strings where you start with a max score, like 100, and subtract based on the number and/or size of the difference. For example, Bryan and Brian are similar, and soundex will mess with you on these, but are legitimate names. You may calculate a score based only on the single character being different. You may also want to incorporate a factor for the actual length of the data being compared.

    If you measure the amount of difference between two characters, you may want to take into account keyboard location (for typos and such).

    Anyway, basaed on how you interpret your information, you may establish that a 100 score is an exact match, and then set a low end for what would be a possible match. Anything below that would be defaulted as not a possible match.

  • I made one as follows,

    CREATE FUNCTION match (@p1 varchar(100), @p2 varchar(100))

    RETURNS int as

    BEGIN

     declare @mark-3 int

     declare @tstr varchar(100)

     declare @f varchar(100)

     declare @i int

     

     select @mark-3 = 100

     select @tstr = rtrim(@p1)

     select @i = charindex(' ', @tstr)

     

     if len(rtrim(@p1)) <> len(rtrim(@p2))

       select @mark-3 = @mark-3 -5

     

     while len(@tstr)>0

     begin

     

      select @i = charindex(' ', @tstr)

     

      if @i > 0

       begin

                          select @f =upper( left(@tstr, @i))

                               select @tstr = substring(@tstr,@i+1,99) 

       end

      else

       begin

         select @f = upper(@tstr+' ')

       select @tstr=''

       end

     

      if   charindex(@f , upper(@p2+' ')) =0  

                     select @mark-3 = @mark-3 -5

     

      if  charindex(@f, upper(@p1+' ')) <>  charindex(@f , upper(@p2+' '))  

                     select @mark-3 = @mark-3 -1

     end

     

     return  @mark-3

  • You do not need to enable full text features to use SOUNDEX or DIFFERENCE...

    DECLARE @Str1 VARCHAR(100)

    DECLARE @Str2 VARCHAR(100)

    SET @Str1 = '129 Green Ave'

    SET @Str2 = 'Green St'

    SELECT DIFFERENCE(@Str1,@Str2)

     

    SET @Str2 = 'Green Ave'

    SELECT DIFFERENCE(@Str1,@Str2)

    SET @Str2 = '120 Green Ave'

    SELECT DIFFERENCE(@Str1,@Str2)

    The higher the number on DIFFERENCE, the better the match.  0 is absolutely no match... 4 is a nearly perfect match.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As the article warns, this method makes use of a correlated subquery that forms a triangular join which is half a cross join.  It will crush any hopes of performance on tables over 10-20k rows.  Might get a bit more out of it with properly indexed tables but this method of counting is definitely not scalable. 

    Also, I don't know why but the article warns "Some of these examples only work with Microsoft SQL Server 6.5 because they use derived tables in the FROM clause. "  Be advised that derived tables in the FROM clause work just fine in version 7 and 2000 and well as all versions of MSDE.  I can't speak for version 2005 but don't think it likely that they would destroy this important functionality.  I strongly suspect that the article was first written when version 6.5 came out, hence the warning.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply