November 5, 2011 at 1:49 pm
how to compare a column from one table with two columns in other table keeping the primary key same in both the tables?
After comparing if the two columns in the table are 70% like the column in the first table, the function must return success else false should be returned.
Please give any suggestions.
Thanks
November 5, 2011 at 2:07 pm
It's hard to help based on the information provided so far.
Please read the first link in my signature and provide some sample data (together with table def and expected results) in a ready to use format.
November 5, 2011 at 2:22 pm
See table 1 has
id,
firstname,
lastname,
transactionamount
table2 has
id,
fullname,
transactionamount
How to compare the firstname,lastname with fullname
And after comaprision check whether the full name is 70% of the the both firrstname and lastname.
Example:
firstname: William
lastname: John
Fullname: Wil John
After comaprision, full name is not 70% of the firstname+lastname,
When this triggers, alert should be fired.
Please suggest.
Thanks.
November 5, 2011 at 2:25 pm
Based on what formula should the percentage value be calculated?
November 5, 2011 at 3:46 pm
Beginner_2008 (11/5/2011)
See table 1 hasid,
firstname,
lastname,
transactionamount
table2 has
id,
fullname,
transactionamount
How to compare the firstname,lastname with fullname
And after comaprision check whether the full name is 70% of the the both firrstname and lastname.
Example:
firstname: William
lastname: John
Fullname: Wil John
After comaprision, full name is not 70% of the firstname+lastname,
When this triggers, alert should be fired.
Please suggest.
Thanks.
Ok... so is Wil John supposed to be a match for William John or not? Including spaces, there are 8 characters in Wil John and 12 characters in William John and that's only a 66.6666% match.
As already stated, we don't know what you mean by 70%. What are the full set of rules for determining the match?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 5, 2011 at 4:08 pm
Beginner_2008 (11/5/2011)
how to compare a column from one table with two columns in other table keeping the primary key same in both the tables?After comparing if the two columns in the table are 70% like the column in the first table, the function must return success else false should be returned.
Please give any suggestions.
Thanks
70% is a figure plucked out of the air, correct?
This so-called "fuzzy matching" is a mix of art and science. It's complicated and difficult. To do it with any degree of accuracy requires a number of algorithms. Each will have its own match percentage.
Your best bet is to buy an off the shelf package. Failing that, SSIS has a fuzzy matching block, Google it. There are a number of algorithms which you can pick up as CLR-ready code such as Jaro-Winkler. In practice I've found that custom building yields better and often faster results but it does take a while to write. If you really wish to go down this route then token-matching is a good place to start: http://www.sqlservercentral.com/Forums/FindPost605062.aspx.
Edit: You will need more than the name alone if you are working with larger data sets. Zip code is good, matched address better. The names table I'm currently working with has several thousand Smiths, of which perhaps two hundred are W Smith. The full address, matched to a certified standard, is required for this.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
November 6, 2011 at 4:48 am
This should get you started:
-- Create some sample data to play with
DROP TABLE #Fullname
CREATE TABLE #Fullname (ID int identity(1,1), Fullname VARCHAR(50))
INSERT INTO #Fullname (fullname)
SELECT 'Phill R Williams' UNION ALL
SELECT 'P Smith' UNION ALL
SELECT 'William Johns' UNION ALL
SELECT 'Sam Brown'
DROP TABLE #Splitname
CREATE TABLE #Splitname (ID int identity(1,1), Firstname VARCHAR(50), Lastname VARCHAR(50))
INSERT INTO #Splitname (Firstname, Lastname)
SELECT 'Phil', 'Williams' UNION ALL
SELECT 'Peter', 'Smith' UNION ALL
SELECT 'Will', 'Johns' UNION ALL
SELECT 'Sam', 'Browne'
SELECT
r.*,
t.Fullname,
m2.*,
[Match%] = CAST((m2.FirstnameMatchPC + m2.LastnameMatchPC)/2.00 AS DECIMAL(5,2))
FROM #Splitname r
INNER JOIN #Fullname t ON t.ID = r.ID
-- find the position of the rightmost space in the surname
-- this is assumed to be the space between forename(s) and surname
-- use it to split the fullname string into firstname and lastname
CROSS APPLY(
SELECT
Firstname = LEFT(fullname,Split.Position-1),
Lastname = SUBSTRING(fullname,Split.Position+1,(DATALENGTH(fullname)-Split.Position+1))
FROM (
SELECT
Position = MAX(n)
FROM ( -- create sequential list of numbers where largest = length of t.fullname
SELECT n = t1.n+t2.n
FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)
WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(t.fullname),0))
) tally
WHERE SUBSTRING(t.fullname,n,1) = ' '
) Split
) t2
-- attempt to match at the simplest level
CROSS APPLY (
SELECT
FirstnameMatchPC = CASE
WHEN t2.Firstname = r.Firstname THEN 100
WHEN (len(t2.Firstname) = 1 OR len(r.Firstname) = 1) AND LEFT(t2.Firstname,1) = LEFT(r.Firstname,1) THEN 20
WHEN (t2.Firstname LIKE r.Firstname + '%' OR r.Firstname LIKE t2.Firstname + '%') THEN 50
ELSE 0 END,
LastnameMatchPC = CASE WHEN t2.Lastname = r.Lastname THEN 100 ELSE 0 END
) m1
-- when simple matching fails, get the hammer out
CROSS APPLY (
SELECT
FirstnameMatchPC = CASE
WHEN m1.FirstnameMatchPC > 0 THEN m1.FirstnameMatchPC -- carry forward the result from previous step / only use token matching where necessary
ELSE (
SELECT MatchPC = CAST(100*(COUNT(*) / (LEN(r.Firstname)-2.00)) AS DECIMAL(5,2))
FROM (
SELECT n = t1.n+t2.n
FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)
WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(r.Firstname),0))
) Tally
WHERE CHARINDEX(SUBSTRING(r.Firstname, Tally.n, 3), t2.Firstname) > 0
AND LEN(SUBSTRING(r.Firstname, Tally.n, 3)) = 3
)
END,
LastnameMatchPC = CASE
WHEN m1.LastnameMatchPC > 0 THEN m1.LastnameMatchPC
ELSE (
SELECT MatchPC = CAST(100*(COUNT(*) / (LEN(r.Lastname)-2.00)) AS DECIMAL(5,2))
FROM (
SELECT n = t1.n+t2.n
FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)
WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(r.Lastname),0))
) Tally
WHERE CHARINDEX(SUBSTRING(r.Lastname, Tally.n, 3), t2.Lastname) > 0
AND LEN(SUBSTRING(r.Lastname, Tally.n, 3)) = 3
)
END
) m2
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply