join on beginning of field

  • A project I'm working on has a requirement to join 2 tables based on the beginning of the customer name.

    For example the main table would contain the master customer name such as "Jones Hardware".  The other table may contain "Jones Hardware Main St." , "Jones Hardware Location 1", or just "Jones Hardware",  all of which should be joined to the first table record of "Jones Hardware", because they begin the same.

    Unfortunately I'm dealing with legacy systems so there isn't an opportunity to redesign this in a relational structure.

    Please advise the best way to accomplish this join.  Thanks for you help,

    Andrew

  • Try this:

    Select Table1.Col1,Table1.Col2...,Table2.Col1,Table1.Col2..

    from table1 INNER JOIN table2 on ltrim(rtrim(table1.Col1)) = left(table2.Col1 ,len(ltrim(rtrim(table1.Col1))))

    Thanks

    Sreejith

  • Check out the soundex and difference functions.  They were designed for such cases.

  • Thanks, that's perfect!

  • Hi Sreejith,

    I am working with your solution and came across a situation that isn't handled the way I would like it to be.  If the table1 list of customers contains 2 customer names that start the same way, (ie. Jones Hardware & Jones Hardware Specialty),  then all table2 examples from my original sample would link up to both of these.

    I'm looking for this: 

    Jones Hardware -->  Jones Hardware, Jones Hardware Main St., Jones Hardware Location 1 etc.

    Jones Hardware Specialty --> Jones Hardware Specialty, Jones Hardware Specialty Loc 1 etc.

    I realize that the selected items in table 2 will depend on the entire list of items in table 1 - we need to match the longest one in table one that still matches the table2 item.

    Any ideas?

    (revision - actually the original example items wouldn't be a problem, but any table2 items starting with Jones Hardware Specialty would link up with both Jones Hardware and Jones Hardware Specialty)

     

  • I wouldn't try to reinvent the wheel ifI were you.

     

    You can use this :

    ORDER BY DIFFERENCE(T1.col, T2.Col) DESC

     

    You can also set on filter on matches that are 3 and 4 only.  I can't tell you exactly what to write because I don't know your data.  You'll have to do quite a few tests before having the most appropriate solution on this problem.

  • AMacKay,

    You really need to redesign your tables so that the <Customers> table has a PRIMARY KEY called something like customerID int. Then you will need to add customerID as a FOREIGN KEY to your <Locations> table. You will then be able to do a join on customerID.

    Something like the following can be used to help sort out the data:

    -- ******************************

    -- Test Data

    CREATE TABLE #Customers

    (

     customer varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY

    )

    GO

    CREATE TABLE #Locations

    (

     location varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY

    )

    GO

    INSERT INTO #Customers

    SELECT 'Jones Hardware' UNION ALL

    SELECT 'Jones Hardware Speciality' UNION ALL

    SELECT 'Jo Public' UNION ALL

    SELECT 'Jones Hardware Speciality Stores'

    INSERT INTO #Locations

    SELECT 'Jo Public Location 1' UNION ALL

    SELECT 'Jo Public Location 2' UNION ALL

    SELECT 'Jones Hardware Location 1' UNION ALL

    SELECT 'Jones Hardware Location 2' UNION ALL

    SELECT 'Jones Hardware Speciality Location 1' UNION ALL

    SELECT 'Jones Hardware Speciality Location 2' UNION ALL

    SELECT 'Jones Hardware Speciality Stores Location' UNION ALL

    SELECT 'Jones Hardware' UNION ALL

    SELECT 'Jones Hardware Speciality' UNION ALL

    SELECT 'Jo Public' UNION ALL

    SELECT 'Jones Hardware Speciality Stores'

    -- End of Test Data

    -- ******************************

    -- The query

    SELECT C.customer, L.location

    FROM #Customers C

     JOIN #Locations L

      ON L.location like C.customer + '%'

    WHERE NOT EXISTS (

     SELECT *

     FROM #Customers C1

      JOIN #Customers C2

       ON C2.Customer LIKE C1.customer + '%'

        AND C1.customer <> C2.customer

     WHERE C1.customer = C.customer

      AND L.location LIKE C2.customer + '%')

     

  • Hi Ninja, I have tried out the DIFFERENCE function, but it's not going to be accurate enough.  For example, "Joes Hardware" vs. "Joes Hardware 123" is a 4, and "Joes Hardware" vs. "Joes Hardware Store 1234" is also a 4.

    As for your suggestion Ken, I would like to be able to make this a proper relational design but one side of it is data coming from an existing system that I can't make any changes to.  I already have a primary key on the customer list table (table1 in the examples), but all I have on the other side to match to is the customer name.  The other issue is that it isn't necessarily location information that defines the various variations in customer name.  It could be just different accounts that were set up for that customer etc.  The requirement for the project is just that the customer name that is entered in my new table (table1) match up with the beginning of the customer name in the existing system.

    That's why Sreejith's solution is the closest solution so far.  The only issue is the false matches when there is a 2nd customer that starts exactly the same as another customer (such as "Jones Hardware" and "Jones Hardware Specialty")

    I'm still working on it. If I come up with a fix I'll let you know.  Any other ideas, I'm all ears!

  • To update, I have decided to approach this problem from a slightly different angle.  Instead of utilizing an on-going direct link between the new system I am creating, and the legacy database, I am going to set up a process to extract the required data for an individual customer when it is being updated through the system, and then store the info in a proper relational form.  That way if the underlying data should change over time in the legacy database, historical records in the new system won't be compromised.  To get around the issue of customer names that are similar, when a record is entered in the new system for a customer that hasn't been entered before, all the customer names that start with the same single word will be pulled into a drop-down menu for the user to select from.  From that point forward, if the same customer name from the legacy system is pulled in for a new ticket in the system, it will automatically be matched to the same one in the drop-down menu again, based on the selection made the previous time.

    Thanks for the suggestions while I figured this all out.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply