September 4, 2008 at 5:45 am
Hi,
I have two table TB1 and TB2
TB1 has these fields - it has 546 records :
ID - INT - has data
FName - VARCHAR(20) - has data
SName - VARCHAR(20) - has data
PayID - null (no data)
TB2 has these fields - it has 514 records:
PayID INT - has data
FName - VARCHAR(20) - has data
SName - VARCHAR(20) - has data
FSName - is combination of FName and SName
I want to update TB1.PayID with this criteria TB2.FSName = TB1.FName + ' ' + SName as FS
another info when I join these two table the T-SQL retrieves me just 398 records based on TB2.FSName = TB.FS
any help to solv this problem the key that I use for joining is concatenating key with FName and SName!!!
:w00t::w00t::w00t:
September 4, 2008 at 6:53 am
Looks like a case where you'll need to manually look at the data and clean it. The first names will not be writtent he same in both tables and you'll need to have a look at that.
Also doing a join on first name only seems a bit risky to me. Even with only 536 records, it's very likely that you'll have a few Johns or James that'll repeat themselves... and the join won't be returning the correct data.
September 5, 2008 at 12:52 am
I know the risk if I work with only Fname for that reason I'm looking the solution to work with combination of FName and SName as FSName in both tables, finally the joining key should be the FSName from TAB1 and FSName from TAB2!
On the other side I know that this is not in rule with T-SQL but the solution must find couz I don't have another chance to do it, simply I want to copy the PAYID from TAB2 in TAB1 - that's all!
Thnx anyway!
:w00t::w00t::w00t:
September 5, 2008 at 5:43 am
Something like this will do it. You may have to fiddle with the token size and the match level cutoff to match your data.
[font="Courier New"]-- Create some sample data
CREATE TABLE #TB2 (PayID INT, FName VARCHAR(20), SName VARCHAR(20), FSName VARCHAR(40))
INSERT INTO #TB2 (PayID, FName, SName, FSName) SELECT REPLACE(au_id, '-', ''), au_fname, au_lname, au_fname + ' ' + au_lname
FROM pubs.dbo.authors
SELECT * FROM #TB2 -- 23 rows
CREATE TABLE #TB1 ([ID] [int] IDENTITY (1, 1) NOT NULL , PayID INT, FName VARCHAR(20), SName VARCHAR(20))
INSERT INTO #TB1 (FName, SName) SELECT FName, SName FROM #TB2
UPDATE #TB1 SET FName = 'Sherryl' WHERE FName = 'Cheryl'
UPDATE #TB1 SET FName = 'Morn' WHERE FName = 'Morningstar'
UPDATE #TB1 SET FName = 'Olivia' WHERE FName = 'Livia'
UPDATE #TB1 SET FName = 'Stuart' WHERE FName = 'Stearns'
SELECT * FROM #TB1 -- 23 rows
-- test the sample data. Only 19 rows match because we've changed 4 of the 23.
SELECT *
FROM #TB2 a
INNER JOIN #TB1 b ON b.FName + ' ' + b.SName = a.FSName -- 19 rows
-- Fuzzy match - 23 rows match.
SELECT r.FSName, t.FSName, t.PayID,
COUNT(*) / CAST(LEN(r.FSName) AS NUMERIC (5,2)) AS MatchLevel
FROM (SELECT FName + ' ' + SName AS FSName FROM #TB1) r
INNER JOIN [Numbers] n ON n.number < LEN(r.FSName)
INNER JOIN #TB2 t ON CHARINDEX(SUBSTRING(r.FSName, n.number, 3), t.FSName) > 0 -- 3 is token size, tune to data
WHERE n.number < 41 -- column length
GROUP BY r.FSName, t.FSName, t.PayID
HAVING COUNT(*) / CAST(LEN(r.FSName) AS NUMERIC (5,2)) > 0.55 -- Cutoff: tune to data
ORDER BY r.FSName, t.FSName
[/font]
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply