October 22, 2018 at 6:58 pm
Hi all,
I have been working with some VBA code to determine the percentage difference between two strings. The Following works well in Access and works correctly, however, in Access it's very slow., e.g. "ABCDE", "ABDCE" is around 80% match. Note: this is not my code, but found looking about the net. seems to work well. So I was considering using a stored procedure in SQL server to perform a similar function. Would there be some genius out there who could offer an idea about how to make this work in an SQL SP? Or event better in a view that returns a result set.
I was following an idea on Fuzzy matching, but found that Levenshtein distance algorithm did not produce accurate results.
Public Function SimilTest()
'Purpose: Attempts to match two strings and determine the percentage match between them.
'Will detail an approximate match for a non matching client to an existing client
Dim strNametoTest As String
Dim rsClientTable As DAO.Recordset
Dim inReturnValue As Double
Dim iPercentReturn As Double
'inReturnValue = Round(Simil(LCase(strClientname), LCase(rsClientTable![Client Name])), 2) * 100
inReturnValue = Simil("ABCDE", "ABDCE") * 100
If inReturnValue > 50 Then
Debug.Print inReturnValue & "%"
End If
SimilTest = CStr(inReturnValue)
End Function
Public Function Simil(strTxt1 As String, strTxt2 As String) As Double
'Purpose: Determine match percentage between two strings.
'Result: Between 0 (no match) and 1 (identical)
Dim intTot As Integer
Dim strMatch As String
intTot = Len(strTxt1 & strTxt2) 'len(strtxt1) + len(strtxt2) 'Which is faster?
strMatch = GetBiggest(strTxt1, strTxt2)
Simil = CDbl(Len(strMatch) * 2) / CDbl(intTot)
End Function
Public Function GetBiggest(strTxt1 As String, strTxt2 As String) As String
Dim intLang As Integer
Dim intKort As Integer
Dim intPos As Integer
Dim intX As Integer
Dim strLangste As String
Dim strSearch As String
Dim strLang As String
Dim strKort As String
Dim strTotal1 As String
Dim strTotal2 As String
intKort = Len(strTxt1)
intLang = Len(strTxt2)
If intLang > intKort Then
strLang = strTxt2
strKort = strTxt1
ElseIf intKort = 0 Or intLang = 0 Then
Exit Function
Else
strLang = strTxt1
strKort = strTxt2
intX = intKort
intKort = intLang
intLang = intX
End If
For intPos = 1 To intKort 'Compare string based on the shortest.
intX = 0
Do
intX = intX + 1
strSearch = Mid$(strKort, intPos, intX) 'Determine part of string to search for
If Len(strSearch) <> intX Then
Exit Do 'end of string
End If
Loop While InStr(1, strLang, strSearch) > 0 'Part of string found in other string, increase size of partstring and try again.
intX = intX - 1
If intX > Len(strLangste) Then 'Longest substring found
strLangste = Mid$(strKort, intPos, intX)
End If
If intX = 0 Then intX = 1
intPos = intPos + intX - 1
Next intPos
If Len(strLangste) = 0 Then
GetBiggest = "" 'No matching substring found
Else 'Substring match found.
'Split substring in left and right part.
strTotal1 = Replace(strTxt1, strLangste, "|")
strTotal2 = Replace(strTxt2, strLangste, "|")
'Recursive part: Try again and paste result to returnvalue.
GetBiggest = strLangste & _
GetBiggest(CStr(Split(strTotal1, "|")(0)), CStr(Split(strTotal2, "|")(0))) & _
GetBiggest(CStr(Split(strTotal1, "|")(1)), CStr(Split(strTotal2, "|")(1)))
End If
End Function
Am a total noob in SQL stored procedures, but have a general idea.
any assistance valuable - Cheers!
October 22, 2018 at 10:15 pm
I suspect that this might help
Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply