String Comparison function

  • Thanks Jason for your reply.

    It solves the issue of trying to get the number on differences in one step but still the sort issue like in example i sent before is not solved.

    Regards

    Nader

  • I don't see where you mentioned a "sort issue" before... What exactly is the sort issue?

  • nadersam (8/11/2015)


    It will return success also, there are 3 different characters they don't have to follow each other.

    I was able to write the attached function but it doesn't work if the string is reversed

    Panadol and lodanap (This should be success but my function will return failure).

    I am sorry the logic is so weird ๐Ÿ™‚ but it's actually needed to assess naming a new drug product in market and avoid name duplication or similarities, currently the users do it manually but we need to automate it.

    Regards

    Nader

    The logic isn't weird at all, it's poorly described because you haven't yet been exposed to the very many permutations which have to be considered. Wayne's made a start. You have to consider dropped letters, added letters, and whether or not those letters appear elsewhere in the strings. It can quickly become impossible to deal with.

    Have you looked at word matching instead of word "unmatching"? You could use a similarity value as the opposite of its intended use. Some of the JaroWinkler stuff is very good. Folks, including myself, have posted up full and partial JW algorithms which work very well in practice.

    As a taster of the problems you might find whilst pursuing this from the opposite direction of others, have a play with this code:

    DECLARE @Reference VARCHAR(50) = 'Panadol', @test-2 VARCHAR(50) = 'Panidal';

    WITH

    a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

    a2 AS (SELECT n = 0 FROM a1, a1 a),

    iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),

    ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),

    TestWord AS (SELECT n, c = SUBSTRING(@Test,n,1) FROM iTally i WHERE i.n <= LEN(@Test))

    SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *

    FROM ReferenceWord l

    left JOIN TestWord r

    ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1

    ORDER BY l.n;

    -------------------------------------------------------------------

    SELECT @Reference = 'Panadol', @test-2 = 'Pandol';

    WITH

    a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

    a2 AS (SELECT n = 0 FROM a1, a1 a),

    iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),

    ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),

    TestWord AS (SELECT n, c = SUBSTRING(@Test,n,1) FROM iTally i WHERE i.n <= LEN(@Test))

    SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *

    FROM ReferenceWord l

    left JOIN TestWord r

    ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1

    ORDER BY l.n;

    -------------------------------------------------------------------

    SELECT @Reference = 'Panado', @test-2 = 'Loanap';

    WITH

    a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

    a2 AS (SELECT n = 0 FROM a1, a1 a),

    iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),

    ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),

    TestWord AS (SELECT n, c = SUBSTRING(REVERSE(@Test),n,1) FROM iTally i WHERE i.n <= LEN(@Test)) -- REVERSE

    SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *

    FROM ReferenceWord l

    left JOIN TestWord r

    ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1

    ORDER BY l.n;

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Some links here.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • nadersam (8/11/2015)


    It will return success also, there are 3 different characters they don't have to follow each other.

    I was able to write the attached function but it doesn't work if the string is reversed

    Panadol and lodanap (This should be success but my function will return failure).

    I am sorry the logic is so weird ๐Ÿ™‚ but it's actually needed to assess naming a new drug product in market and avoid name duplication or similarities, currently the users do it manually but we need to automate it.

    Regards

    Nader

    How does this work for you?

    DECLARE @CompareString VARCHAR(100) = 'Panadol';

    DECLARE @TestStrings TABLE (RowID INTEGER IDENTITY, SearchString VARCHAR(100));

    INSERT INTO @TestStrings

    VALUES ('XYZadol'), ('xyadolz'), ('xyadol'), ('lodanap'), ('XaYdZol');

    -- create a dynamic tally table

    WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

    Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2),

    Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3),

    Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions),

    -- get all of the characters for each test string

    TestStrings2 AS (SELECT t.RowID, t.SearchString, ca.N, ca.SearchChar

    FROM @TestStrings t

    CROSS APPLY (SELECT TOP (DATALENGTH(SearchString)) N, SUBSTRING(t.SearchString, N, 1) AS SearchChar FROM Tally) ca),

    -- test for the existing characters between the two strings, and the number of matching / not matching characters

    TestExist AS (SELECT t2.RowID, t2.SearchString, t2.N, t2.SearchChar, ca1.InCompareString,

    COUNT(*) OVER (PARTITION BY t2.SearchString, ca1.InCompareString

    ORDER BY t2.N

    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CompareCount

    FROM TestStrings2 t2

    CROSS APPLY (SELECT InCompareString = CONVERT(BIT, CHARINDEX(t2.SearchChar, @CompareString)))ca1

    ) ,

    -- take the characters that are in the compare string, and concatenate them together into a string

    Compare AS (SELECT t2.RowID, t2.SearchString, t2.N, t2.SearchChar, t2.InCompareString, t2.CompareCount, ca1.NewCompareStr

    FROM TestExist t2

    CROSS APPLY (SELECT (SELECT '' + t3.SearchChar

    FROM TestExist t3

    WHERE t3.RowID = t2.RowID

    AND t3.InCompareString = 1

    ORDER BY t3.N

    FOR XML PATH(''), TYPE).value('.','varchar(max)')

    ) ca1(NewCompareStr))

    -- final results: the strings that have a least three characters not in the compare string,

    -- and where the test string without the characters not in the compare string are in the compare string.

    SELECT DISTINCT SearchString

    FROM Compare

    WHERE (InCompareString = 0 AND CompareCount >= 3)

    OR CHARINDEX(NewCompareStr, @CompareString) = 0

    ORDER BY SearchString;

    SELECT *

    FROM (VALUES ('XYZadol'), ('xyadolz'), ('lodanap'), ('XaYdZol')) dt(ExpectedResults)

    ORDER BY dt.ExpectedResults;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Jason A. Long (8/11/2015)


    I don't see where you mentioned a "sort issue" before... What exactly is the sort issue?

    Please check my previous reply above

    "No it will return success as order is different."

  • Again i would like to thank you all.

    Jason,

    Thanks for the full outer join it reduced the duplication in code and having to insert into table variables, i have applied that.

    Chris,

    To Handle the permutations and characters like space, dashes, dots etc. , i have incorporated the difference and soundex functions as well and use it together with the string comparison function but still trying to grab the idea you mentioned about using matching instead of non matching.

    Wayne,

    I will try the code you sent and send you feedback.

    Regards

    Nader

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply