January 20, 2019 at 1:52 pm
I have Table1.ColA that contains a Varchar(50) sting that I need to fuzzy-lookup against Table2.ColA.
And insert the fuzzily-matched value Into Table1.ColA_FuzzyMatched column that originally contains Nulls only. All 3 columns are Varchar(50).
I have never done this before, honestly. I am sure there are some better practices to do this that other practices.. There is a task to do this in SSIS but a lot of our desirable fuzzy lookups are not to be initiated via SSIS.
Is there a good T-SQL /SP out there that someone is aware of of that does this?
And/or is there may be a function(ality) in SQL 2016 and 2017 that had not existed before and now is providing additional help for achieving such fuzzy goals as fuzzy lookups?
Thank you!
January 21, 2019 at 11:36 pm
The "Levenshtein Distance Algorithm" seems to be one of the most accurate. Here's one link and, of course, Google is loaded for bear on this subject.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/fuzzy-searches-sql-server/
https://www.google.com/search?client=firefox-b-1-ab&q=levenshtein+distance+sql
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2019 at 2:57 am
CREATE FUNCTION [dbo].[FuzzyMatch_iTVF2k5]
(@Reference VARCHAR(100) = NULL,
@Target VARCHAR(100) = NULL)
RETURNS table WITH SCHEMABINDING
AS
-- Chris Morris 2012
-- Fuzzy-matching using shifted character positions
-- See also http://research.microsoft.com/pubs/75996/bm_sigmod03.pdf
-----------------------------------------------------------------
CREATE FUNCTION [dbo].[FuzzyMatch_iTVFV2]
(@Reference VARCHAR(100) = NULL,
@Target VARCHAR(100) = NULL)
RETURNS table WITH SCHEMABINDING
AS
-- Chris Morris 2012
-- Fuzzy-matching
-- See also http://research.microsoft.com/pubs/75996/bm_sigmod03.pdf
RETURN
--------------------------------------------------------------------------------------------------------------------
Here's an example of usage:
;WITH
SampleData1 AS (SELECT * FROM (VALUES ('Victor Proteus'), ('Victor Protus'), ('Victor ProteEus')) d (String1)),
SampleData2 AS (SELECT * FROM (VALUES ('Victor Proteus'), ('Victor Protes'), ('Proteus Victor')) d (String2))
SELECT *
FROM SampleData1 d1
CROSS JOIN SampleData2 d2
CROSS APPLY dbo.FuzzyMatch_iTVFV2 (d1.String1, d2.String2) m
WHERE d1.String1 > d2.String2
AND m.MatchRatio > 0.4
ORDER BY d1.String1, d2.String2
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply