August 29, 2013 at 9:16 am
Hello,
I am running a very simple SSIS package with a fuzzy lookup that is not working. I am RDP'ing to the server (Windows Server 2008 R2 Ent) with SQL Server 2008 R2.
I simply want to create a listing of all customer names and indicate if they are similar to other names (for example, ABC Inc is similar to ABC Inc.). I am comparing on a single column (Name).
When I run the package, it basically returns _Similarity = 1 for almost all names -- even though they are clearly similar. I have similarity threshold set to .7 and delimiters are space, carriage return, tab, and line feed. My data source is a table using OLE DB.
Here are partial results to illustrate what is happening:
Name _Similarity_Confidence
ABC CORPORATION11
ABC CORPORATION11
ABC CORPORATION OF AMERICA - CEJ50245311
ABC CORPORATION OF AMERICA - CEJ50287011
ABC CORPORATION OF AMERICA,11
ABC CORPORATION OF AMERICA,11
ABC CORPORATION OF AMERICA,0.98750.5
ABC CORPORATION/ MORVEN PARK11
Sorry about the poor formatting but as you can see _similarity is 1 for all but a single name. Based on previous uses of this tool, I would expect ABC CORPORATION to be 1 and the remainder to be < 1, depending on the similarity.
I have found some articles online reporting similar behaviour but they seem to be having this problem when the data source is an Excel file. This is related to 32 bit office and 64 bit SSIS. I'm using OLE DB for my data source, not Excel.
I've experimented with different reference table settings, etc. -- all to no avail.
What am I missing.
Any help would be appreciated.
Let me know if I left any pertinent info out.
Thanks,
Brett
Viewing 0 posts
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy