November 2, 2022 at 3:39 am
I work on sql server 2019 i have issue i can't get percentage of two string from table
so i will compare between two string columns (PartText,MaskText)
and i will display similar percentage of two column
so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters
so difference will be 80 percent
create table #compareTextPercentage
(
PartText varchar(50),
MaskText varchar(50)
)
insert into #compareTextPercentage(PartText,MaskText)
values
('TR00123907FG','KB00123907FG'),
('TR00123907FG','TR00123907FG'),
('klmhedf43122','50ghlpnkyzhy')
November 2, 2022 at 3:59 pm
Lookup Levenshtein Distance:
https://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
November 3, 2022 at 9:52 am
I work on sql server 2019 i have issue i can't get percentage of two string from table
so i will compare between two string columns (PartText,MaskText) and i will display similar percentage of two column
so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters
so difference will be 80 percent
You want to perform this match based on exact character position?
In which case
'123456789x' and 'x123456789'
would produce a 0% match.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 3, 2022 at 12:43 pm
Quick thought, algorithms like Levenshtein Distance are overkill for such a naive comparison.
😎
Here is one way of doing this:
USE TEEST;
GO
SET NOCOUNT ON;
GO
---------------------------------------------------------------------
-- SAMPLE DATA
---------------------------------------------------------------------
DECLARE @COMPTEXT TABLE
(
PartText VARCHAR(50) NOT NULL
,MaskText VARCHAR(50)
);
INSERT INTO @COMPTEXT(PartText,MaskText)
VALUES
('TR00123907FG','KB00123907FG'),
('TR00123907FG','TR00123907FG'),
('klmhedf43122','50ghlpnkyzhy')
;
---------------------------------------------------------------------
-- INLINE TALLY TABLE BASE
---------------------------------------------------------------------
;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
---------------------------------------------------------------------
-- MAXIMUM TEXT LENGTH
---------------------------------------------------------------------
,DCOMP(MXTXLEN) AS
(
SELECT
MAX(LEN(CT.PartText)) AS MXTXLEN
FROM @COMPTEXT CT
)
---------------------------------------------------------------------
-- INLINE TALLY TABLE
---------------------------------------------------------------------
, NUMS(N) AS (SELECT TOP((SELECT DC.MXTXLEN FROM DCOMP DC)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
FROM T T1,T T2)
---------------------------------------------------------------------
-- ADDING GROUP ID TO THE SAMPLE DATA
---------------------------------------------------------------------
,TXNUM(RID,PartText,MaskText) AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY @@VERSION) AS RID
,CT.PartText
,CT.MaskText
FROM @COMPTEXT CT
)
---------------------------------------------------------------------
-- CHARACTER SEGMENTATION
---------------------------------------------------------------------
,TCHAR(RID,PTCH,MTCH) AS
(
SELECT
TXN.RID
,SUBSTRING(TXN.PartText,NM.N,1) AS PTCH
,SUBSTRING(TXN.MaskText,NM.N,1) AS MTCH
FROM TXNUM TXN
CROSS APPLY NUMS NM
)
---------------------------------------------------------------------
-- COMPARE AND AGGRGATE THE RESULTS AND REPORT
-- NOTE: IMPLICIT CONVERSION FROM INT!
---------------------------------------------------------------------
SELECT
TC.RID
,((SUM(CASE
WHEN TC.PTCH = TC.MTCH THEN 1
ELSE 0
END) / MAX(DC.MXTXLEN + 0.0)) * 100 ) AS LIKENESS
FROM TCHAR TC
CROSS APPLY DCOMP DC
GROUP BY TC.RID;
Result set
RID LIKENESS
---- ------------------
1 83.3333333333300
2 100.0000000000000
3 8.3333333333300
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply