November 8, 2006 at 1:54 pm
A project I'm working on has a requirement to join 2 tables based on the beginning of the customer name.
For example the main table would contain the master customer name such as "Jones Hardware". The other table may contain "Jones Hardware Main St." , "Jones Hardware Location 1", or just "Jones Hardware", all of which should be joined to the first table record of "Jones Hardware", because they begin the same.
Unfortunately I'm dealing with legacy systems so there isn't an opportunity to redesign this in a relational structure.
Please advise the best way to accomplish this join. Thanks for you help,
Andrew
November 8, 2006 at 2:00 pm
Try this:
Select Table1.Col1,Table1.Col2...,Table2.Col1,Table1.Col2..
from table1 INNER JOIN table2 on ltrim(rtrim(table1.Col1)) = left(table2.Col1 ,len(ltrim(rtrim(table1.Col1))))
Thanks
Sreejith
November 8, 2006 at 2:06 pm
Check out the soundex and difference functions. They were designed for such cases.
November 8, 2006 at 2:10 pm
Thanks, that's perfect!
November 9, 2006 at 10:00 am
Hi Sreejith,
I am working with your solution and came across a situation that isn't handled the way I would like it to be. If the table1 list of customers contains 2 customer names that start the same way, (ie. Jones Hardware & Jones Hardware Specialty), then all table2 examples from my original sample would link up to both of these.
I'm looking for this:
Jones Hardware --> Jones Hardware, Jones Hardware Main St., Jones Hardware Location 1 etc.
Jones Hardware Specialty --> Jones Hardware Specialty, Jones Hardware Specialty Loc 1 etc.
I realize that the selected items in table 2 will depend on the entire list of items in table 1 - we need to match the longest one in table one that still matches the table2 item.
Any ideas?
(revision - actually the original example items wouldn't be a problem, but any table2 items starting with Jones Hardware Specialty would link up with both Jones Hardware and Jones Hardware Specialty)
November 9, 2006 at 11:03 am
I wouldn't try to reinvent the wheel ifI were you.
You can use this :
ORDER BY DIFFERENCE(T1.col, T2.Col) DESC
You can also set on filter on matches that are 3 and 4 only. I can't tell you exactly what to write because I don't know your data. You'll have to do quite a few tests before having the most appropriate solution on this problem.
November 9, 2006 at 11:36 am
AMacKay,
You really need to redesign your tables so that the <Customers> table has a PRIMARY KEY called something like customerID int. Then you will need to add customerID as a FOREIGN KEY to your <Locations> table. You will then be able to do a join on customerID.
Something like the following can be used to help sort out the data:
-- ******************************
-- Test Data
CREATE TABLE #Customers
(
customer varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY
)
GO
CREATE TABLE #Locations
(
location varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY
)
GO
INSERT INTO #Customers
SELECT 'Jones Hardware' UNION ALL
SELECT 'Jones Hardware Speciality' UNION ALL
SELECT 'Jo Public' UNION ALL
SELECT 'Jones Hardware Speciality Stores'
INSERT INTO #Locations
SELECT 'Jo Public Location 1' UNION ALL
SELECT 'Jo Public Location 2' UNION ALL
SELECT 'Jones Hardware Location 1' UNION ALL
SELECT 'Jones Hardware Location 2' UNION ALL
SELECT 'Jones Hardware Speciality Location 1' UNION ALL
SELECT 'Jones Hardware Speciality Location 2' UNION ALL
SELECT 'Jones Hardware Speciality Stores Location' UNION ALL
SELECT 'Jones Hardware' UNION ALL
SELECT 'Jones Hardware Speciality' UNION ALL
SELECT 'Jo Public' UNION ALL
SELECT 'Jones Hardware Speciality Stores'
-- End of Test Data
-- ******************************
-- The query
SELECT C.customer, L.location
FROM #Customers C
JOIN #Locations L
ON L.location like C.customer + '%'
WHERE NOT EXISTS (
SELECT *
FROM #Customers C1
JOIN #Customers C2
ON C2.Customer LIKE C1.customer + '%'
AND C1.customer <> C2.customer
WHERE C1.customer = C.customer
AND L.location LIKE C2.customer + '%')
November 9, 2006 at 12:08 pm
Hi Ninja, I have tried out the DIFFERENCE function, but it's not going to be accurate enough. For example, "Joes Hardware" vs. "Joes Hardware 123" is a 4, and "Joes Hardware" vs. "Joes Hardware Store 1234" is also a 4.
As for your suggestion Ken, I would like to be able to make this a proper relational design but one side of it is data coming from an existing system that I can't make any changes to. I already have a primary key on the customer list table (table1 in the examples), but all I have on the other side to match to is the customer name. The other issue is that it isn't necessarily location information that defines the various variations in customer name. It could be just different accounts that were set up for that customer etc. The requirement for the project is just that the customer name that is entered in my new table (table1) match up with the beginning of the customer name in the existing system.
That's why Sreejith's solution is the closest solution so far. The only issue is the false matches when there is a 2nd customer that starts exactly the same as another customer (such as "Jones Hardware" and "Jones Hardware Specialty")
I'm still working on it. If I come up with a fix I'll let you know. Any other ideas, I'm all ears!
November 9, 2006 at 5:17 pm
To update, I have decided to approach this problem from a slightly different angle. Instead of utilizing an on-going direct link between the new system I am creating, and the legacy database, I am going to set up a process to extract the required data for an individual customer when it is being updated through the system, and then store the info in a proper relational form. That way if the underlying data should change over time in the legacy database, historical records in the new system won't be compromised. To get around the issue of customer names that are similar, when a record is entered in the new system for a customer that hasn't been entered before, all the customer names that start with the same single word will be pulled into a drop-down menu for the user to select from. From that point forward, if the same customer name from the legacy system is pulled in for a new ticket in the system, it will automatically be matched to the same one in the drop-down menu again, based on the selection made the previous time.
Thanks for the suggestions while I figured this all out.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply