SSIS Fuzzy Lookup Logic

  • What logic/algorithms does the fuzzy lookup object use to reconcile matches? I can't really find anything about it on the web anywhere.

    I accept the possibility that there just isn't an answer outside of Microsoft source code, so please tell me to forget about it if you think this isn't something that is available to the public.

    Thanks,

    Mark

  • Recently I happened to worked with Microsoft SSIS consultants face to face to improve one of the fuzzy matching process I have developed. But the thing is when comes to understanding the logic used to derive the _Similarity score from individual similarity score...... he didnt had much to explain. The only take away from the meeting was the _similarity score is based on the frequency of tokens appearing in the Error Tolerance Index table.

    But if you are looking for the surface knowledge of how it works... see these articles.

    http://msdn.microsoft.com/en-us/library/ms345128(SQL.90).aspx

  • If I was a betting man, I would say they used the Jaro-Winkler distance metric

    http://en.wikipedia.org/wiki/Jaro-Winkler

    ...if I was a betting man that is 😎

  • or maybe even Levenshtein:

    http://en.wikipedia.org/wiki/Levenshtein_distance

    One of those pretty much gets the job done and is easy to implement as a class 😉

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply