String Comparison function

Question

Post reply

String Comparison function

nadersam

SSCertifiable

Points: 6542
More actions
October 13, 2015 at 3:29 pm

#399178

Hi All,
I am trying to write a function to compare the characters between 2 strings and eliminate the similarities to be able to return at the end the number of differences between them.
Having in mind i need the bigger number of differences to be returned also if a character is repeated in one of the 2 words it will be eliminated once because it exist only one time in other string.
I will give an example below to be more clear
--Start
declare @string1 as varchar(50)='imos'
declare @string2 as varchar(50)='nasos';
WITH n (n) AS (
SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)
), Tally (n) AS (
SELECT TOP (SELECT MAX(LEN(x.String)) FROM (VALUES (@String1), (@String2)) x (String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
n n1, n n2, n n3
), Alphas AS (
SELECT a.Letter FROM (VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),
('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) a (Letter)
), Split1 (Letter) AS (
SELECT
SUBSTRING(@String1, t.n, 1)
FROM
Tally t
WHERE
SUBSTRING(@String1, t.n, 1) IN (SELECT a.Letter FROM Alphas a)
), split2 (Letter) AS (
SELECT
SUBSTRING(@String2, t.n, 1)
FROM
Tally t
WHERE
SUBSTRING(@String2, t.n, 1) IN (SELECT a.Letter FROM Alphas a))
select Letter,1 from Split1 st1 union all select letter,2 from split2 st2
--End
The differences in first string from second one are 2 (i,m) while the differences in second string from first one are 3(nas).
So the function should return 3 in previous example.
Thanks in advance
Nader
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mister.magoo SSC-Forever Points: 47174 More actions · Answer 1

Here's one, but it's not quick and I can't think of a way to make it quick, but maybe someone else can.

CREATE FUNCTION FindDifferenceCount(@string1 AS VARCHAR(50), @string2 AS VARCHAR(50))

RETURNS TABLE

WITH SCHEMABINDING

AS

RETURN(

WITH

N(N) AS (

SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1)) N (N)

) -- 8 rows

, TALLY (N) AS (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N N1, N N2

) -- 64 rows (input strings are 50, so 64 is more than enough)

, LETTERS1(L,P) AS (

SELECT SUBSTRING(@string1, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string1, T.N, 1) ORDER BY (SELECT NULL))

FROM TALLY T

WHERE SUBSTRING(@string1, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

) -- Select only the letters and an ordinal from @string1

, LETTERS2(L,P) AS (

SELECT SUBSTRING(@string2, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string2, T.N, 1) ORDER BY (SELECT NULL))

FROM TALLY T

WHERE SUBSTRING(@string2, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

)

SELECT DISTINCT

CASE -- Find the largest number of differences

WHEN COUNT(R1.L) OVER() > COUNT(R2.L) OVER() THEN COUNT(R1.L) OVER()

ELSE COUNT(R2.L) OVER()

END AS DIFFERENCES

FROM LETTERS1 R1

FULL OUTER JOIN LETTERS2 R2

ON R2.L = R1.L

AND R2.P = R1.P

WHERE R1.L IS NULL OR R2.L IS NULL -- We only care if there is no match

);

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

nadersam SSCertifiable Points: 6542 More actions · Answer 2

Thank you very much mister.magoo.

Your solution is great, the problem i had in mine is when a letter was duplicated in one of the words which is S in my example, it gets eliminated because it existed once in other string, while what i needed is to count it.

Not sure how did you solve that in your solution.

Thanks Nader

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.

So, for the word "test", the results of CTE "LETTERS1" would be

L P

e 1

s 1

t 1

t 2 <-- this is the second "t"

Then when I match to the other string, I do the join on "L" and "P".

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

nadersam SSCertifiable Points: 6542 More actions · Answer 4

mister.magoo (10/14/2015)
To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.
So, for the word "test", the results of CTE "LETTERS1" would be
L P
e 1
s 1
t 1
t 2 <-- this is the second "t"
Then when I match to the other string, I do the join on "L" and "P".

Great thank you very much