August 11, 2015 at 7:28 am
Thanks Jason for your reply.
It solves the issue of trying to get the number on differences in one step but still the sort issue like in example i sent before is not solved.
Regards
Nader
August 11, 2015 at 7:48 am
I don't see where you mentioned a "sort issue" before... What exactly is the sort issue?
August 11, 2015 at 9:48 am
nadersam (8/11/2015)
It will return success also, there are 3 different characters they don't have to follow each other.I was able to write the attached function but it doesn't work if the string is reversed
Panadol and lodanap (This should be success but my function will return failure).
I am sorry the logic is so weird ๐ but it's actually needed to assess naming a new drug product in market and avoid name duplication or similarities, currently the users do it manually but we need to automate it.
Regards
Nader
The logic isn't weird at all, it's poorly described because you haven't yet been exposed to the very many permutations which have to be considered. Wayne's made a start. You have to consider dropped letters, added letters, and whether or not those letters appear elsewhere in the strings. It can quickly become impossible to deal with.
Have you looked at word matching instead of word "unmatching"? You could use a similarity value as the opposite of its intended use. Some of the JaroWinkler stuff is very good. Folks, including myself, have posted up full and partial JW algorithms which work very well in practice.
As a taster of the problems you might find whilst pursuing this from the opposite direction of others, have a play with this code:
DECLARE @Reference VARCHAR(50) = 'Panadol', @test-2 VARCHAR(50) = 'Panidal';
WITH
a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),
a2 AS (SELECT n = 0 FROM a1, a1 a),
iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),
ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),
TestWord AS (SELECT n, c = SUBSTRING(@Test,n,1) FROM iTally i WHERE i.n <= LEN(@Test))
SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *
FROM ReferenceWord l
left JOIN TestWord r
ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1
ORDER BY l.n;
-------------------------------------------------------------------
SELECT @Reference = 'Panadol', @test-2 = 'Pandol';
WITH
a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),
a2 AS (SELECT n = 0 FROM a1, a1 a),
iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),
ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),
TestWord AS (SELECT n, c = SUBSTRING(@Test,n,1) FROM iTally i WHERE i.n <= LEN(@Test))
SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *
FROM ReferenceWord l
left JOIN TestWord r
ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1
ORDER BY l.n;
-------------------------------------------------------------------
SELECT @Reference = 'Panado', @test-2 = 'Loanap';
WITH
a1 AS (SELECT n = 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),
a2 AS (SELECT n = 0 FROM a1, a1 a),
iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM a2, a2 a),
ReferenceWord AS (SELECT n, c = SUBSTRING(@Reference,n,1) FROM iTally i WHERE i.n <= LEN(@Reference)),
TestWord AS (SELECT n, c = SUBSTRING(REVERSE(@Test),n,1) FROM iTally i WHERE i.n <= LEN(@Test)) -- REVERSE
SELECT LenR = LEN(@Reference), LenT = LEN(@Test), *
FROM ReferenceWord l
left JOIN TestWord r
ON r.c = l.c AND r.n BETWEEN l.n-1 AND l.n+1
ORDER BY l.n;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 11, 2015 at 9:52 am
Some links here.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 11, 2015 at 2:42 pm
nadersam (8/11/2015)
It will return success also, there are 3 different characters they don't have to follow each other.I was able to write the attached function but it doesn't work if the string is reversed
Panadol and lodanap (This should be success but my function will return failure).
I am sorry the logic is so weird ๐ but it's actually needed to assess naming a new drug product in market and avoid name duplication or similarities, currently the users do it manually but we need to automate it.
Regards
Nader
How does this work for you?
DECLARE @CompareString VARCHAR(100) = 'Panadol';
DECLARE @TestStrings TABLE (RowID INTEGER IDENTITY, SearchString VARCHAR(100));
INSERT INTO @TestStrings
VALUES ('XYZadol'), ('xyadolz'), ('xyadol'), ('lodanap'), ('XaYdZol');
-- create a dynamic tally table
WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2),
Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3),
Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions),
-- get all of the characters for each test string
TestStrings2 AS (SELECT t.RowID, t.SearchString, ca.N, ca.SearchChar
FROM @TestStrings t
CROSS APPLY (SELECT TOP (DATALENGTH(SearchString)) N, SUBSTRING(t.SearchString, N, 1) AS SearchChar FROM Tally) ca),
-- test for the existing characters between the two strings, and the number of matching / not matching characters
TestExist AS (SELECT t2.RowID, t2.SearchString, t2.N, t2.SearchChar, ca1.InCompareString,
COUNT(*) OVER (PARTITION BY t2.SearchString, ca1.InCompareString
ORDER BY t2.N
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS CompareCount
FROM TestStrings2 t2
CROSS APPLY (SELECT InCompareString = CONVERT(BIT, CHARINDEX(t2.SearchChar, @CompareString)))ca1
) ,
-- take the characters that are in the compare string, and concatenate them together into a string
Compare AS (SELECT t2.RowID, t2.SearchString, t2.N, t2.SearchChar, t2.InCompareString, t2.CompareCount, ca1.NewCompareStr
FROM TestExist t2
CROSS APPLY (SELECT (SELECT '' + t3.SearchChar
FROM TestExist t3
WHERE t3.RowID = t2.RowID
AND t3.InCompareString = 1
ORDER BY t3.N
FOR XML PATH(''), TYPE).value('.','varchar(max)')
) ca1(NewCompareStr))
-- final results: the strings that have a least three characters not in the compare string,
-- and where the test string without the characters not in the compare string are in the compare string.
SELECT DISTINCT SearchString
FROM Compare
WHERE (InCompareString = 0 AND CompareCount >= 3)
OR CHARINDEX(NewCompareStr, @CompareString) = 0
ORDER BY SearchString;
SELECT *
FROM (VALUES ('XYZadol'), ('xyadolz'), ('lodanap'), ('XaYdZol')) dt(ExpectedResults)
ORDER BY dt.ExpectedResults;
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
August 12, 2015 at 12:55 am
Jason A. Long (8/11/2015)
I don't see where you mentioned a "sort issue" before... What exactly is the sort issue?
Please check my previous reply above
"No it will return success as order is different."
August 12, 2015 at 1:06 am
Again i would like to thank you all.
Jason,
Thanks for the full outer join it reduced the duplication in code and having to insert into table variables, i have applied that.
Chris,
To Handle the permutations and characters like space, dashes, dots etc. , i have incorporated the difference and soundex functions as well and use it together with the string comparison function but still trying to grab the idea you mentioned about using matching instead of non matching.
Wayne,
I will try the code you sent and send you feedback.
Regards
Nader
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply