February 3, 2008 at 12:02 pm
The problem I have is matching redundant data between two tables. Both tables MyTable_A and MyTable_B are completely un-normalized. They contain a lot of redundant data and some unique data. I believe the files were originally large spread sheets that were imported into SQL and passed on to me.
In order to bring order to this ciaos I created a primary key in each table. Then created an AddressID to group and there for uniquely identify duplicate addresses with in each table. I did this with Phone and name also. These are not keys relating the two files. These ID’s only let me identify the unique addresses, names and phone numbers within the respective tables.
The first thing I wanted to do is determine the addresses that match between the two files. In order to do this I need to consider only unique AddressID’s in the respective files and match the actual address between the two files.
CREATE TABLE #MyTable_A (MyTable_AID INT PRIMARY KEY,
AddressID INT, Address Varchar(50),
Zip Varchar(10),
PhoneID INT, Phone Varchar(10),
NameID INT, Namex Varchar(50))
INSERT INTO #MyTable_A VALUES (1, 1, '123 Main', '12345', 1, '1231111111', 1, 'John Doe')
INSERT INTO #MyTable_A VALUES (2, 2, '222 North', '22222', 1, '2222222222', 2, 'Sue Kent')
INSERT INTO #MyTable_A VALUES (3, 1, '123 Main', '12345', 1, '1232222222', 3, 'Mary Doe')
INSERT INTO #MyTable_A VALUES (4, 3, '333 South', '33333', 1, '3333333333', 4, 'Frank Black')
INSERT INTO #MyTable_A VALUES (5, 1, '123 Main', '12345', 1, '1234444444', 5, 'John Smith')
CREATE TABLE #MyTable_B (MyTable_BID INT PRIMARY KEY,
AddressID INT, Address Varchar(50),
Zip Varchar(10),
PhoneID INT, Phone Varchar(10),
NameID INT, Namex Varchar(50))
INSERT INTO #MyTable_B VALUES (1, 5, '123 Main', '12345', 1, '1231111111', 1, 'Larry Franks')
INSERT INTO #MyTable_B VALUES (2, 2, '444 East', '44444', 2, '4444444444', 2, 'Frank Black')
INSERT INTO #MyTable_B VALUES (3, 5, '123 Main', '12345', 3, '1232222222', 3, 'Joe Little')
INSERT INTO #MyTable_B VALUES (4, 7, '333 South', '33333', 4, '3333333333', 2, 'Frank Black')
INSERT INTO #MyTable_B VALUES (5, 7, '333 South', '33333', 4, '3333333333', 4, 'Phil Johnson')
I want a result set that looks like this.
MyAddresses
MyAddresses_ID.....MTA_AID.....MTB_AID.....MTA_Address.....MTB_Address.....MTA_Zip.....MTB_Zip
1.......................1........5............123 Main........123 Main.......12345.......12345
2.......................3........7............333 South......333 South.......33333.......33333
Now that I know the addresses that match I can bring in the nonmatching addresses from both tables. This will give me an Address table containing unique addresses. I will do the same thing using the Phone and NameID to create Name and phone tables that junction to the address table. This will give me a normalized DB to work with.
I am certainly open to better ways to do this.
I have tried
CREATE TABLE #MyAddresses
(
MyAddresses_ID int IDENTITY(1,1),
MTA_AID int,
MTB_AID int,
MTA_Address Varchar(50),
MTB_Address Varchar(50),
MTA_Zip Varchar(10),
MTB_Zip Varchar(10)
)
INSERT INTO #MyAddresses (MTA_AID,MTB_AID,
MTA_Address,MTB_Address,
MTA_Zip,MTB_Zip)
SELECT d1.AddressID,d2.AddressID,
d1.Address.d2.Address,
d1.Zip,d2.Zip
FROM
(
SELECT DISTINCT AddressID,Address,Zip
FROM #MyTable_A
)d1
INNER JOIN
(
SELECT DISTINCT AddressID,Address,Zip
FROM #MyTable_B
)d2
ON d2.Address=d1.Address
AND d2.Zip=d1.Zip
select * from #MyAddresses
But I get this result set:
1 1 1 123 Main 123 Main 12345 12345
2 3 3 333 South 333 South 33333 33333
The addresses are as expected, But the MTA_AID and MTB_AID are not. Based on the data they should be 1, 5 and 3, 7? There is no AddressID 1 for 123 Main or 3 for '333 South' in #MyTable_B.
How do i correct this or should I usr a differant approach?
February 3, 2008 at 7:26 pm
I ran your query and I got the result
115123 Main123 Main1234512345
237333 South333 South3333333333
Also I hoped this was a typo
SELECT d1.AddressID,d2.AddressID,
d1.Address,d2.Address,
d1.Zip,d2.Zip
You had d1.Address.d2.Address
February 3, 2008 at 8:15 pm
Thanks for catching that typo. I also found another fat finger mistake that was causing my result set ont to be correct.
February 3, 2008 at 8:16 pm
Thanks for catching that typo. I also found another fat finger mistake that was causing my result set ont to be correct.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply