October 13, 2015 at 3:29 pm
Hi All,
I am trying to write a function to compare the characters between 2 strings and eliminate the similarities to be able to return at the end the number of differences between them.
Having in mind i need the bigger number of differences to be returned also if a character is repeated in one of the 2 words it will be eliminated once because it exist only one time in other string.
I will give an example below to be more clear
--Start
declare @string1 as varchar(50)='imos'
declare @string2 as varchar(50)='nasos';
WITH n (n) AS (
SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)
), Tally (n) AS (
SELECT TOP (SELECT MAX(LEN(x.String)) FROM (VALUES (@String1), (@String2)) x (String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
n n1, n n2, n n3
), Alphas AS (
SELECT a.Letter FROM (VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),
('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) a (Letter)
), Split1 (Letter) AS (
SELECT
SUBSTRING(@String1, t.n, 1)
FROM
Tally t
WHERE
SUBSTRING(@String1, t.n, 1) IN (SELECT a.Letter FROM Alphas a)
), split2 (Letter) AS (
SELECT
SUBSTRING(@String2, t.n, 1)
FROM
Tally t
WHERE
SUBSTRING(@String2, t.n, 1) IN (SELECT a.Letter FROM Alphas a))
select Letter,1 from Split1 st1 union all select letter,2 from split2 st2
--End
The differences in first string from second one are 2 (i,m) while the differences in second string from first one are 3(nas).
So the function should return 3 in previous example.
Thanks in advance
Nader
October 13, 2015 at 6:03 pm
Here's one, but it's not quick and I can't think of a way to make it quick, but maybe someone else can.
CREATE FUNCTION FindDifferenceCount(@string1 AS VARCHAR(50), @string2 AS VARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN(
WITH
N(N) AS (
SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1)) N (N)
) -- 8 rows
, TALLY (N) AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N N1, N N2
) -- 64 rows (input strings are 50, so 64 is more than enough)
, LETTERS1(L,P) AS (
SELECT SUBSTRING(@string1, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string1, T.N, 1) ORDER BY (SELECT NULL))
FROM TALLY T
WHERE SUBSTRING(@string1, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN
) -- Select only the letters and an ordinal from @string1
, LETTERS2(L,P) AS (
SELECT SUBSTRING(@string2, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string2, T.N, 1) ORDER BY (SELECT NULL))
FROM TALLY T
WHERE SUBSTRING(@string2, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN
)
SELECT DISTINCT
CASE -- Find the largest number of differences
WHEN COUNT(R1.L) OVER() > COUNT(R2.L) OVER() THEN COUNT(R1.L) OVER()
ELSE COUNT(R2.L) OVER()
END AS DIFFERENCES
FROM LETTERS1 R1
FULL OUTER JOIN LETTERS2 R2
ON R2.L = R1.L
AND R2.P = R1.P
WHERE R1.L IS NULL OR R2.L IS NULL -- We only care if there is no match
);
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
October 14, 2015 at 3:34 am
Thank you very much mister.magoo.
Your solution is great, the problem i had in mine is when a letter was duplicated in one of the words which is S in my example, it gets eliminated because it existed once in other string, while what i needed is to count it.
Not sure how did you solve that in your solution.
Thanks Nader
October 14, 2015 at 5:13 am
To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.
So, for the word "test", the results of CTE "LETTERS1" would be
L P
e 1
s 1
t 1
t 2 <-- this is the second "t"
Then when I match to the other string, I do the join on "L" and "P".
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
October 14, 2015 at 6:02 am
mister.magoo (10/14/2015)
To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.So, for the word "test", the results of CTE "LETTERS1" would be
L P
e 1
s 1
t 1
t 2 <-- this is the second "t"
Then when I match to the other string, I do the join on "L" and "P".
Great thank you very much
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply