How to add Percentage of similar compare two string ?

Question

Post reply

How to add Percentage of similar compare two string ?

ahmed_elbarbary.2010

Hall of Fame

Points: 3118
More actions
November 2, 2022 at 3:39 am

#4107576
I work on sql server 2019 i have issue i can't get percentage of two string from table
so i will compare between two string columns (PartText,MaskText)
and i will display similar percentage of two column
so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters
so difference will be 80 percent
```
create table #compareTextPercentage
 (
 PartText varchar(50),
 MaskText varchar(50)
 )
 insert into #compareTextPercentage(PartText,MaskText)
 values
 ('TR00123907FG','KB00123907FG'),
 ('TR00123907FG','TR00123907FG'),
 ('klmhedf43122','50ghlpnkyzhy')
```

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 1

Lookup Levenshtein Distance:

https://www.red-gate.com/simple-talk/blogs/string-comparisons-in-sql-edit-distance-and-the-levenshtein-algorithm/

https://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Phil Parkin SSC Guru Points: 247203 More actions · Answer 2

ahmed_elbarbary.2010 wrote:

I work on sql server 2019 i have issue i can't get percentage of two string from table
so i will compare between two string columns (PartText,MaskText) and i will display similar percentage of two column
so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters
so difference will be 80 percent

You want to perform this match based on exact character position?

In which case

'123456789x' and 'x123456789'

would produce a 0% match.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 3

Quick thought, algorithms like Levenshtein Distance are overkill for such a naive comparison.

😎

Here is one way of doing this:

USE TEEST;
GO
SET NOCOUNT ON;
GO
---------------------------------------------------------------------
-- SAMPLE DATA
---------------------------------------------------------------------
DECLARE @COMPTEXT TABLE
 (
    PartText VARCHAR(50)   NOT NULL
   ,MaskText VARCHAR(50)
 );
 INSERT INTO @COMPTEXT(PartText,MaskText)
 VALUES
     ('TR00123907FG','KB00123907FG'),
     ('TR00123907FG','TR00123907FG'),
     ('klmhedf43122','50ghlpnkyzhy')
;
---------------------------------------------------------------------
-- INLINE TALLY TABLE BASE
---------------------------------------------------------------------
;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
---------------------------------------------------------------------
-- MAXIMUM TEXT LENGTH
---------------------------------------------------------------------
,DCOMP(MXTXLEN) AS
(
    SELECT 
        MAX(LEN(CT.PartText)) AS MXTXLEN
    FROM @COMPTEXT CT
)
---------------------------------------------------------------------
-- INLINE TALLY TABLE
---------------------------------------------------------------------
,  NUMS(N) AS (SELECT TOP((SELECT DC.MXTXLEN FROM DCOMP DC)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
               FROM  T T1,T T2)
---------------------------------------------------------------------
-- ADDING GROUP ID TO THE SAMPLE DATA
---------------------------------------------------------------------
,TXNUM(RID,PartText,MaskText) AS 
(
    SELECT 
        ROW_NUMBER() OVER (ORDER BY @@VERSION) AS RID
       ,CT.PartText
       ,CT.MaskText
    FROM @COMPTEXT  CT
)
---------------------------------------------------------------------
-- CHARACTER SEGMENTATION
---------------------------------------------------------------------
,TCHAR(RID,PTCH,MTCH) AS
(
    SELECT
        TXN.RID
       ,SUBSTRING(TXN.PartText,NM.N,1)  AS PTCH
       ,SUBSTRING(TXN.MaskText,NM.N,1)  AS MTCH
    FROM TXNUM  TXN
    CROSS APPLY NUMS NM
)
---------------------------------------------------------------------
-- COMPARE AND AGGRGATE THE RESULTS AND REPORT
-- NOTE: IMPLICIT CONVERSION FROM INT!
---------------------------------------------------------------------
SELECT
    TC.RID
   ,((SUM(CASE 
        WHEN TC.PTCH = TC.MTCH THEN 1
        ELSE 0
    END) / MAX(DC.MXTXLEN + 0.0)) * 100 ) AS LIKENESS
FROM TCHAR  TC
CROSS APPLY DCOMP DC
GROUP BY TC.RID;

Result set

RID  LIKENESS
---- ------------------
1    83.3333333333300
2    100.0000000000000
3    8.3333333333300

This reply was modified 3 years, 2 months ago by Eirikur Eiriksson.