April 28, 2004 at 8:11 am
I have a set of emp records that contain duplicate entries. The problem with this duplicate is that they may point to information that is not contained in the other duplicated entry. For example, we keep track of United Fund donations and one record may have three donations, while the other record may have one donation. One record may contain the spouse information, where the other doesnt.
I have flagged the duplicate entries, but I don't want to delete the duplicate, I want to merge into one record, then purge the entries that were duplicated. I want to add the spouse info, if available, and all the donation information from all records.
I need help with the SQL that can do this without extensive parsing and processing since we have 100,000 records that fit into this category.
April 28, 2004 at 8:17 am
Can you post some example data?
April 28, 2004 at 8:23 am
table 1 employee contains empID, prefix, fname, mname, lname, addr1, addr2, addr3, city, state, zip, highDonation, totaldonation, first date, last date
table 2 contains empID, donation, date, amount
table 3 contains empID, letterkeycode, datesent
I have replaceID as a field in table 1 which is null (will contain the new id of the merged record)
duplicate is in table 1 on lname, addr1, city
fname may contain duplicate, or contain a spouse name
trick is to merge table 1, then merge table 2, and table3
April 28, 2004 at 10:32 pm
I think the first thing you need to do is decide the order of precedence.
For example, if you have a record in Table 1, Table 2 and Table 3 and the value for spouse name is different in each, which one should you choose???
I would then update your records based on the order of precedence. Eg. If your order is like Table 1 = 1st Choice, Table 2 = 2nd Choice, Table 3 = 3rd Choice, I would then write a series of UPDATE's like:
UPDATE Table1
SET Table1.[Name] = (SELECT Table3.[Name] FROM Table3
INNER JOIN Table1 ON Table3.[ID] = Table1.[ID])
WHERE Table1.[Name] is NULL
Then repeat the process replacing Table3 in the update with Table2. After doing this you should be able to delete the tables with your duplicates.
Someone else may have a more elegant way of doing this.
Cheers,
Angela
April 29, 2004 at 5:04 am
Actually, this is missing the question. The duplicate records are in table 1. Table 2 and 3 contain suplimentary information pointed to by Table 1. The order of precident is to select the record in Table 1 and then merge the records pointed to by table 1 into records pointed to by the newly merged record.
April 29, 2004 at 11:35 pm
Okay what about this:
-- First we get the all the records and any duplicates are given an id to tie them to the first record found with a matching lname, addr1 and city.
SELECT DISTINCT(MAX(ta1.empid)) AS BadEmpid, t1.empid, t1.prefix, t1.fname, t1.mname, t1.lname, t1.addr1, t1.addr2, t1.addr3, t1.city, t1.state, t1.zip,t1.highdonation, t1.totaldonation, t1.firstdate, t1.lastdate
INTO #TEMP from table1 ta1
INNER JOIN Table1 T1 ON Ta1.lname = T1.Lname
AND Ta1.addr1 = T1.addr1 AND Ta1.city = T1.city
GROUP BY t1.empid,t1.prefix, t1.fname, t1.mname, t1.lname, t1.addr1, t1.addr2, t1.addr3, t1.city, t1.state, t1.zip,t1.highdonation, t1.totaldonation, t1.firstdate, t1.lastdate
HAVING COUNT (t1.lname) >1
ORDER BY t1.lname
-- Then we update the original table. Adding the details from the duplicate records to the first one found as we go. This is based on a match on lname, addr1 and city. This will not match simalar records, they must be a complete match.
UPDATE table1
SET table1.prefix = #temp.prefix,
table1.fname = #temp.fname,
table1.mname = #temp.mname,
table1.lname = #temp.lname,
table1.addr1 = #temp.addr1,
table1.addr2 = #temp.addr2,
table1.addr3 = #temp.addr3,
table1.city = #temp.city,
table1.state = #temp.state,
table1.zip = #temp.zip,
table1.highdonation = #temp.highdonation,
table1.totaldonation = #temp.totaldonation,
table1.firstdate = #temp.firstdate,
table1.lastdate = #temp.lastdate
FROM #temp
INNER JOIN table1 ON (table1.lname = #temp.lname
AND table1.addr1 = #temp.addr1
AND table1.city = #temp.city )
WHERE table1.empid <> #temp.badempid
Is this more useful?
Cheers,
Angela
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply