How to add Percentage of similar compare two string ?

  • I work on sql server 2019 i have issue i can't get percentage of two string from table

    so i will compare between two string columns (PartText,MaskText)

    and i will display similar percentage of two column

    so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters

    so difference will be 80 percent

    create table #compareTextPercentage
    (
    PartText varchar(50),
    MaskText varchar(50)
    )
    insert into #compareTextPercentage(PartText,MaskText)
    values
    ('TR00123907FG','KB00123907FG'),
    ('TR00123907FG','TR00123907FG'),
    ('klmhedf43122','50ghlpnkyzhy')

  • Lookup Levenshtein Distance:

    https://www.red-gate.com/simple-talk/blogs/string-comparisons-in-sql-edit-distance-and-the-levenshtein-algorithm/

    https://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • ahmed_elbarbary.2010 wrote:

    I work on sql server 2019 i have issue i can't get percentage of two string from table

    so i will compare between two string columns (PartText,MaskText) and i will display similar percentage of two column

    so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters

    so difference will be 80 percent

    You want to perform this match based on exact character position?

    In which case

    '123456789x' and 'x123456789'

    would produce a 0% match.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Quick thought, algorithms like Levenshtein Distance are overkill for such a naive comparison.

    😎

    Here is one way of doing this:

    USE TEEST;
    GO
    SET NOCOUNT ON;
    GO
    ---------------------------------------------------------------------
    -- SAMPLE DATA
    ---------------------------------------------------------------------
    DECLARE @COMPTEXT TABLE
    (
    PartText VARCHAR(50) NOT NULL
    ,MaskText VARCHAR(50)
    );
    INSERT INTO @COMPTEXT(PartText,MaskText)
    VALUES
    ('TR00123907FG','KB00123907FG'),
    ('TR00123907FG','TR00123907FG'),
    ('klmhedf43122','50ghlpnkyzhy')
    ;
    ---------------------------------------------------------------------
    -- INLINE TALLY TABLE BASE
    ---------------------------------------------------------------------
    ;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    ---------------------------------------------------------------------
    -- MAXIMUM TEXT LENGTH
    ---------------------------------------------------------------------
    ,DCOMP(MXTXLEN) AS
    (
    SELECT
    MAX(LEN(CT.PartText)) AS MXTXLEN
    FROM @COMPTEXT CT
    )
    ---------------------------------------------------------------------
    -- INLINE TALLY TABLE
    ---------------------------------------------------------------------
    , NUMS(N) AS (SELECT TOP((SELECT DC.MXTXLEN FROM DCOMP DC)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
    FROM T T1,T T2)
    ---------------------------------------------------------------------
    -- ADDING GROUP ID TO THE SAMPLE DATA
    ---------------------------------------------------------------------
    ,TXNUM(RID,PartText,MaskText) AS
    (
    SELECT
    ROW_NUMBER() OVER (ORDER BY @@VERSION) AS RID
    ,CT.PartText
    ,CT.MaskText
    FROM @COMPTEXT CT
    )
    ---------------------------------------------------------------------
    -- CHARACTER SEGMENTATION
    ---------------------------------------------------------------------
    ,TCHAR(RID,PTCH,MTCH) AS
    (
    SELECT
    TXN.RID
    ,SUBSTRING(TXN.PartText,NM.N,1) AS PTCH
    ,SUBSTRING(TXN.MaskText,NM.N,1) AS MTCH
    FROM TXNUM TXN
    CROSS APPLY NUMS NM
    )
    ---------------------------------------------------------------------
    -- COMPARE AND AGGRGATE THE RESULTS AND REPORT
    -- NOTE: IMPLICIT CONVERSION FROM INT!
    ---------------------------------------------------------------------
    SELECT
    TC.RID
    ,((SUM(CASE
    WHEN TC.PTCH = TC.MTCH THEN 1
    ELSE 0
    END) / MAX(DC.MXTXLEN + 0.0)) * 100 ) AS LIKENESS
    FROM TCHAR TC
    CROSS APPLY DCOMP DC
    GROUP BY TC.RID;

    Result set

    RID  LIKENESS
    ---- ------------------
    1 83.3333333333300
    2 100.0000000000000
    3 8.3333333333300

     

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply