join on beginning of field

Question

Post reply

join on beginning of field

amackay_1

Ten Centuries

Points: 1240
More actions
November 8, 2006 at 1:54 pm

#117600

A project I'm working on has a requirement to join 2 tables based on the beginning of the customer name.
For example the main table would contain the master customer name such as "Jones Hardware". The other table may contain "Jones Hardware Main St." , "Jones Hardware Location 1", or just "Jones Hardware", all of which should be joined to the first table record of "Jones Hardware", because they begin the same.
Unfortunately I'm dealing with legacy systems so there isn't an opportunity to redesign this in a relational structure.
Please advise the best way to accomplish this join. Thanks for you help,
Andrew

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Sreejith Sreedharan SSCertifiable Points: 5403 More actions · Answer 1

Try this:

Select Table1.Col1,Table1.Col2...,Table2.Col1,Table1.Col2..

from table1 INNER JOIN table2 on ltrim(rtrim(table1.Col1)) = left(table2.Col1 ,len(ltrim(rtrim(table1.Col1))))

Thanks

Sreejith

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Check out the soundex and difference functions. They were designed for such cases.

amackay_1 Ten Centuries Points: 1240 More actions · Answer 3

amackay_1

Ten Centuries

Points: 1240

November 8, 2006 at 2:10 pm

#670713

Thanks, that's perfect!

amackay_1 Ten Centuries Points: 1240 More actions · Answer 4

Hi Sreejith,

I am working with your solution and came across a situation that isn't handled the way I would like it to be. If the table1 list of customers contains 2 customer names that start the same way, (ie. Jones Hardware & Jones Hardware Specialty), then all table2 examples from my original sample would link up to both of these.

I'm looking for this:

Jones Hardware --> Jones Hardware, Jones Hardware Main St., Jones Hardware Location 1 etc.

Jones Hardware Specialty --> Jones Hardware Specialty, Jones Hardware Specialty Loc 1 etc.

I realize that the selected items in table 2 will depend on the entire list of items in table 1 - we need to match the longest one in table one that still matches the table2 item.

Any ideas?

(revision - actually the original example items wouldn't be a problem, but any table2 items starting with Jones Hardware Specialty would link up with both Jones Hardware and Jones Hardware Specialty)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

I wouldn't try to reinvent the wheel ifI were you.

You can use this :

ORDER BY DIFFERENCE(T1.col, T2.Col) DESC

You can also set on filter on matches that are 3 and 4 only. I can't tell you exactly what to write because I don't know your data. You'll have to do quite a few tests before having the most appropriate solution on this problem.

Ken McKelvey SSCoach Points: 18842 More actions · Answer 6

AMacKay,

You really need to redesign your tables so that the <Customers> table has a PRIMARY KEY called something like customerID int. Then you will need to add customerID as a FOREIGN KEY to your <Locations> table. You will then be able to do a join on customerID.

Something like the following can be used to help sort out the data:

-- ******************************

-- Test Data

CREATE TABLE #Customers

(

customer varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY

)

GO

CREATE TABLE #Locations

(

location varchar(50) COLLATE DATABASE_DEFAULT NOT NULL PRIMARY KEY

)

GO

INSERT INTO #Customers

SELECT 'Jones Hardware' UNION ALL

SELECT 'Jones Hardware Speciality' UNION ALL

SELECT 'Jo Public' UNION ALL

SELECT 'Jones Hardware Speciality Stores'

INSERT INTO #Locations

SELECT 'Jo Public Location 1' UNION ALL

SELECT 'Jo Public Location 2' UNION ALL

SELECT 'Jones Hardware Location 1' UNION ALL

SELECT 'Jones Hardware Location 2' UNION ALL

SELECT 'Jones Hardware Speciality Location 1' UNION ALL

SELECT 'Jones Hardware Speciality Location 2' UNION ALL

SELECT 'Jones Hardware Speciality Stores Location' UNION ALL

SELECT 'Jones Hardware' UNION ALL

SELECT 'Jones Hardware Speciality' UNION ALL

SELECT 'Jo Public' UNION ALL

SELECT 'Jones Hardware Speciality Stores'

-- End of Test Data

-- ******************************

-- The query

SELECT C.customer, L.location

FROM #Customers C

JOIN #Locations L

ON L.location like C.customer + '%'

WHERE NOT EXISTS (

SELECT *

FROM #Customers C1

JOIN #Customers C2

ON C2.Customer LIKE C1.customer + '%'

AND C1.customer <> C2.customer

WHERE C1.customer = C.customer

AND L.location LIKE C2.customer + '%')

amackay_1 Ten Centuries Points: 1240 More actions · Answer 7

Hi Ninja, I have tried out the DIFFERENCE function, but it's not going to be accurate enough. For example, "Joes Hardware" vs. "Joes Hardware 123" is a 4, and "Joes Hardware" vs. "Joes Hardware Store 1234" is also a 4.

As for your suggestion Ken, I would like to be able to make this a proper relational design but one side of it is data coming from an existing system that I can't make any changes to. I already have a primary key on the customer list table (table1 in the examples), but all I have on the other side to match to is the customer name. The other issue is that it isn't necessarily location information that defines the various variations in customer name. It could be just different accounts that were set up for that customer etc. The requirement for the project is just that the customer name that is entered in my new table (table1) match up with the beginning of the customer name in the existing system.

That's why Sreejith's solution is the closest solution so far. The only issue is the false matches when there is a 2nd customer that starts exactly the same as another customer (such as "Jones Hardware" and "Jones Hardware Specialty")

I'm still working on it. If I come up with a fix I'll let you know. Any other ideas, I'm all ears!

amackay_1 Ten Centuries Points: 1240 More actions · Answer 8

To update, I have decided to approach this problem from a slightly different angle. Instead of utilizing an on-going direct link between the new system I am creating, and the legacy database, I am going to set up a process to extract the required data for an individual customer when it is being updated through the system, and then store the info in a proper relational form. That way if the underlying data should change over time in the legacy database, historical records in the new system won't be compromised. To get around the issue of customer names that are similar, when a record is entered in the new system for a customer that hasn't been entered before, all the customer names that start with the same single word will be pulled into a drop-down menu for the user to select from. From that point forward, if the same customer name from the legacy system is pulled in for a new ticket in the system, it will automatically be matched to the same one in the drop-down menu again, based on the selection made the previous time.

Thanks for the suggestions while I figured this all out.