Accepting fuzzy lookup changes into actual dataset

Question

Accepting fuzzy lookup changes into actual dataset

rsteckly

Say Hey Kid

Points: 664
More actions
December 21, 2009 at 7:40 pm

#156522

Hi,
I have 802K records in a dataset and a reference table. I have two fields: schoolcode and schoolname. I have a reference table with schoolcode and schoolname in it with some other data I need in my original table.
The trouble is, they just don't match up exactly in all cases. Some of the differences are predictable while others aren't.
I ran fuzzy lookup and sent the results to another table to look at how they match up.
There doesn't appear to be a similarity threshold that would automatically just get me the dataset with the automatic matches.
I created a view with all of the distinct code and name matches. I'd like to add a column such as 'isAccpeted' or something like that. This way, I can look over each distinct match and put a 1 for the ones I want and a 0 for the one's I don't.
What I'm not sure about, however, is how to send those changes BACK to the original table. Because my intermediate view only shows distinct values, I would only be marking each match, not each row.
What kind of thing would I need to do to send those changes back to the original table? I've thought of perhaps using the Merge task or maybe the Join task. Any ideas?
Thanks,
Ron

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

albertarun SSC Veteran Points: 231 More actions · Answer 1

I worked on some of the large scale fuzzy matching projects. Based on my experience you are in the right track.

You already have a table to which you load your fuzzy results. Add a derived column named IsMatchFlg and default the value to 0. Once you finish loading the data into the table run a SQL script to flag you aceptable record as IsMatchFlg=1.

This flag should be applied to 1 record per batch which is the record having maximum similarity score.

Then a select from this table where IsMatchFlg=1 will give your matching records.

I guess this helped. Let me know