March 25, 2013 at 5:39 am
I'm trying to get a list of duplicate records and suggest one to keep.
The first column returns the unique client numbers (custno) and the second column (keepno) should be the Max value of that group.
I'm struggling to get the keepno value.
The following code I've done so far finds a lists the duplicates okay (based on name and zip code) but I don't know how to get that keepno. It's probably a two step procedure?
SELECT a.custno, RTRIM(a.firstname)+' '+RTRIM(a.lastname) AS name,
company, RTRIM(address1) as address1, add1.zip
FROM cusfil a
JOIN address add1 ON a.custno=add1.custno
WHERE RTRIM(a.firstname)+RTRIM(a.lastname)+RTRIM(add1.zip) IN
(
SELECT RTRIM(b.firstname)+RTRIM(b.lastname)+RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno=add2.custno
WHERE add2.curraddr=1 AND
RTRIM(b.lastname)<>''
GROUP BY b.firstname,b.lastname,add2.zip
HAVING COUNT(b.firstname+b.lastname+add2.zip)>1
)
AND add1.curraddr=1 AND
RTRIM(a.lastname)<>''
ORDER BY a.lastname,a.firstname,add1.zip
example results:
custnokeepnonamecompanyaddress1zip
"8001""8001""John Doe""""10 Happy St""1114"
"15""8001""John Doe""""10 Happy St""1114"
"7456""7456""Fred Finstone""""121 Bedrock Pl""203"
"147""7456""Fred Finstone""""121 Bedrock Pl""203"
March 25, 2013 at 5:53 am
i've done this at my previous place and found assignment of a scoring algorithm the best method.
you'll need to build the score based on what you think is most important and once a total score is applied to each row you can assign a row number based on the order of the score largest to smallest
Hope this helps
March 25, 2013 at 6:05 am
here's an example :
SELECT a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, ROW_NUMBER() OVER( PARTITION BY a.CustNo ORDER BY
--just a simple sum based on amount of data populated
(CASE WHEN RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN company IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN RTRIM(address1) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN add1.zip IS NOT NULL THEN 1 ELSE 0 END ) Desc
) AS DupeOrder
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip
March 25, 2013 at 10:06 am
Brilliant!! never thought to use PARTITION BY 🙂
Here's what it looks like ...
A big thank you for your time.
SELECT a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, MAX(a.custno) OVER( PARTITION BY RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip)
) AS keepno
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip
March 25, 2013 at 10:18 am
Your very welcome.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply