June 14, 2010 at 3:39 am
Hi everyone,
I am hoping that someone can help me with a query based on the below:
I have two tables. One contains a list of potential duplicate records, grouped by a column, id_master
The second is a list of pairs of records that should not be merged as duplicates.
I am looking to delete that pairs of records in Table_1 where they match a pair in Table_2.
Here are some sample records:
Create table #duplicates
(id int,
First nvarchar (50),
Last nvarchar (50),
Person_id int,
Id_master int)
Create table #blacklist
(id_1 int,
Id_2 int)
insert into #duplicates (id, first, last, person_id, id_master)
select 1, 'Mark', 'Wonder', 10223, 7
union all
select 2, 'Simon', 'Waterhouse', 15226, 4
union all
select 3, 'Vanessa', 'Anderson',21321, 9
union all
select 4, 'Simon','Waterhouse', 45621, 4
union all
select 5, 'Raymond', 'Rooney', 63325, 10
union all
select 6, 'Vanessa', 'Anderson', 54123, 9
union all
select 7, 'Mark', 'Wonder', 15668, 7
union all
select 8, 'Layla', 'Danana', 42213, 11
union all
select 9, 'Vanessa', 'Anderson',75521, 9
union all
select 10, 'Raymond', 'Rooney', 63325, 10
union all
select 11, 'Layla', 'Danana', 96632, 11
insert into #blacklist (id_1, id_2)
select 21321, 75521
union all
select 96632, 42213
union all
select 15526, 63325
select * from #duplicates
select * from #blacklist
From the above I would like to delete the following ids from the duplicates table:
3 & 9, 8 & 11
I would not expect 2 & 10 to be deleted because they don't share the same id_master, meaning that they are not a part of a group in the duplicates table
I have puzzled over this and all I have managed to do so far is remove records if both pairs in the blacklist table EXIST in the duplicates table. I am not able to verify that they are both in a group of duplicates.
I hope someone can educate and assist!
Thanks,
Paul
June 14, 2010 at 4:34 am
Two different solutions.
-- query 1
SELECT o.id, o.first, o.last, o.person_id, o.id_master
FROM (
SELECT d.id, d.first, d.last, d.person_id, d.id_master,
x = COUNT(*) OVER (PARTITION BY d.person_id)
FROM #duplicates d
INNER JOIN #blacklist b
ON b.id_1 = d.Person_id OR b.id_2 = d.Person_id
) o
WHERE x = 1
-- query 2
SELECT a.*
FROM #duplicates a
INNER JOIN (
SELECT id1 = d1.id, id2 = d2.id
FROM #blacklist b
LEFT JOIN #duplicates d1 ON d1.Person_id = b.id_1
LEFT JOIN #duplicates d2 ON d2.Person_id = b.id_2
WHERE d1.Person_id <> d2.Person_id
) d ON d.id1 = a.id OR d.id2 = a.id
The results of either could be used in a DELETE FROM.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 16, 2010 at 2:09 am
Thanks very much for your assistance, Chris. That works great.
I prefer to use query 2, as I can understand it. I am not too sure on how partitioning works.
I appreciate your help.
Paul
June 16, 2010 at 2:27 am
Paul_Harvey (6/16/2010)
Thanks very much for your assistance, Chris. That works great.I prefer to use query 2, as I can understand it. I am not too sure on how partitioning works.
I appreciate your help.
Paul
You're welcome, Paul, thanks for the feedback.
PARTITION BY is explained pretty well in BOL. A really cool use is
COUNT(columnname or *) OVER (PARTITION BY columnname), which gives you the number of rows in a partition (a range of equal values in a column) without rolling them up - the same value appears in each row of a partition. SUM(columnname) works too.
SELECT PartitionElements = COUNT(*) OVER (PARTITION BY [last]),
UselessValue = SUM(id) OVER (PARTITION BY [last]),
id, [first], [last], person_id, id_master
FROM #duplicates
ORDER BY [Last], [first]
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 16, 2010 at 2:30 am
I'll check it out.
Thanks again.
🙂
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply