Fuzzy logic between two columns

  • Hello,

    I have tried various methods to accomplish this using functions like PATINDEX, DIFFERENCE, and the Levenshtein Distance function, but I haven't been successful.

    I have two columns of type VARCHAR, and I need to determine if a specific word exists in both columns. For example, in the rows below, the string "603260" matches. The input do not follow any specific pattern.

    "START SOFT SERIES B A/B 150-F201NBD 603260"

    "JHFLETCH=51462,U/C 603260"

    with sm as
    (
    select 'CYLINDER SHEAR INVERTED JOY 100180008' Ellips, 'JOY=100180008,ICG=4030' CONL
    UNION
    select 'JOY=100180008,ICG=40303', 'ADAPT BULKHEAD TNC F/F ST FLT 33090'
    UNION
    select 'SHAFT SLEEVE SHORT TONS PER HIOUR 014S 0F1AC11 T006', 'STAMLER=98-A5S1608000,JOY=98-A5S1608000'
    UNION
    select 'START SOFT SERIES B A/B 150-F201NBD 603260 ','JHFLETCH=51462,U/C 603260'
    )

    SELECT * FROM SM
    Attachments:
    You must be logged in to view attached files.
  • getsaby wrote:

    Hello,

    I have tried various methods to accomplish this using functions like PATINDEX, DIFFERENCE, and the Levenshtein Distance function, but I haven't been successful.

    I have two columns of type VARCHAR, and I need to determine if a specific word exists in both columns. For example, in the rows below, the string "603260" matches. The input do not follow any specific pattern.

    "START SOFT SERIES B A/B 150-F201NBD 603260"

    "JHFLETCH=51462,U/C 603260"

    with sm as
    (
    select 'CYLINDER SHEAR INVERTED JOY 100180008' Ellips, 'JOY=100180008,ICG=4030' CONL
    UNION
    select 'JOY=100180008,ICG=40303', 'ADAPT BULKHEAD TNC F/F ST FLT 33090'
    UNION
    select 'SHAFT SLEEVE SHORT TONS PER HIOUR 014S 0F1AC11 T006', 'STAMLER=98-A5S1608000,JOY=98-A5S1608000'
    UNION
    select 'START SOFT SERIES B A/B 150-F201NBD 603260 ','JHFLETCH=51462,U/C 603260'
    )

    SELECT * FROM SM

    You say a 'specific' word, but in your image there are two different matches highlighted. Rather than 'specific', did you mean 'any'?


  • Have a look at this. I used a temp table & added a PK for fun.

    DROP TABLE IF EXISTS #temptable

    CREATE TABLE #temptable
    (
    SomePK INT IDENTITY(1, 1) NOT NULL PRIMARY KEY
    ,Ellips VARCHAR(51)
    ,CONL VARCHAR(39)
    );

    INSERT #temptable
    (
    Ellips
    ,CONL
    )
    VALUES
    ('CYLINDER SHEAR INVERTED JOY 100180008', 'JOY=100180008,ICG=4030')
    ,('JOY=100180008,ICG=40303', 'ADAPT BULKHEAD TNC F/F ST FLT 33090')
    ,('SHAFT SLEEVE SHORT TONS PER HIOUR 014S 0F1AC11 T006', 'STAMLER=98-A5S1608000,JOY=98-A5S1608000')
    ,('START SOFT SERIES B A/B 150-F201NBD 603260 ', 'JHFLETCH=51462,U/C 603260');

    SELECT t.SomePK
    ,t.Ellips
    ,t.CONL FROM #temptable t
    CROSS APPLY STRING_SPLIT(t.Ellips,' ') ss1
    CROSS APPLY STRING_SPLIT(t.CONL, ' ') ss2
    WHERE ss1.value = ss2.value

  • Thanks Phil,

    Sorry , I think you are correct it should be "any" word, so for the above 4 rows, there is another row, where "100180008" exists between them.

    ('CYLINDER SHEAR INVERTED JOY 100180008', 'JOY=100180008,ICG=4030')

  • SELECT DISTINCT t.SomePK, t.Ellips, t.CONL, ss1.value AS MatchingWord
    FROM #temptable t
    CROSS APPLY STRING_SPLIT(TRANSLATE(t.Ellips, '=,/()', ' '), ' ') ss1
    CROSS APPLY STRING_SPLIT(TRANSLATE(t.CONL, '=,/()', ' '), ' ') ss2
    WHERE ss1.value = ss2.value
    AND ss1.value <> ''; -- Exclude empty values
  • Nice refinement, Jonathan.


  • Thank you, @Phil Parkin and @jonathan-2 AC Roberts! This is exactly what I needed. It works perfectly.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply