Get Similarity Values is Same Table on Same Row

  • Is there a way to get how similar two values are that are on the same row in the same table? I have a table that has address info that was joined from two tables. This table now has both addresses from the two tables and I would like to know how similar those addresses are. Here's an example of what I want to get

    ID AddrFromTable1 AddrFromTable2 SimilarityForAddrFromTable1AndAddrFromTable2

    1 123 Main St 123 Main St 1.00

    2 1 Fake St 1 Fake Street .92

    Is there a way to do this in SSIs using the Fuzzy Logic?

  • for addresses, there are a number of services that will standardize your address info for a fee;

    if your cheap or just like a challenge (like me) what I've done is add a column which I update with the "CleanAddress"

    i get that by updating the address through a collection of replaces so that any abbreviations I've encountered so var are converted to a standardized description; [st],[st.][street] and [str] all are replaced to a standard keyword.

    from experience, the replaces sometimes need to occur in a specific order, as Saint / [St] / [St.] needs to be taken into consideration as well.

    for example, for me, i stretch abbreviations out to their full names; i know i posted an example int he forums here, let me see if my google fu is up to the task of finding that example again.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks Lowell, I didn't need it to do any clean up so i was just curious on how do get the fuzzy number between the two addresses. I actually got it to work but had to use the two separate tables. In the Fuzzy Lookup section Link the two tables on ID as well as the address fields I wanted to have a fuzzy value. Then have a conditional split after the Fuzzy Lookup section where it returns only records where ID similarity = 1.

    The result only returned where the ID similarity = 1 which was 215 records out of 215 records from original table, but now I wonder if it could potentially fuzzy match on an ID where it's not equal on both tables because the better address match was for a different ID on the lookup table. If this happen how to i explicit say in the Fuzzy Matching to link on exact ID (similarity = 1 all the time for this lookup column) and then give similarity values for the address lookup columns.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply