Cross Referencing Tables

Question

Post reply

Cross Referencing Tables

Paul_Harvey

Hall of Fame

Points: 3822
More actions
June 14, 2010 at 3:39 am

#219962

Hi everyone,
I am hoping that someone can help me with a query based on the below:
I have two tables. One contains a list of potential duplicate records, grouped by a column, id_master
The second is a list of pairs of records that should not be merged as duplicates.
I am looking to delete that pairs of records in Table_1 where they match a pair in Table_2.
Here are some sample records:
Create table #duplicates
(id int,
First nvarchar (50),
Last nvarchar (50),
Person_id int,
Id_master int)
Create table #blacklist
(id_1 int,
Id_2 int)
insert into #duplicates (id, first, last, person_id, id_master)
select 1, 'Mark', 'Wonder', 10223, 7
union all
select 2, 'Simon', 'Waterhouse', 15226, 4
union all
select 3, 'Vanessa', 'Anderson',21321, 9
union all
select 4, 'Simon','Waterhouse', 45621, 4
union all
select 5, 'Raymond', 'Rooney', 63325, 10
union all
select 6, 'Vanessa', 'Anderson', 54123, 9
union all
select 7, 'Mark', 'Wonder', 15668, 7
union all
select 8, 'Layla', 'Danana', 42213, 11
union all
select 9, 'Vanessa', 'Anderson',75521, 9
union all
select 10, 'Raymond', 'Rooney', 63325, 10
union all
select 11, 'Layla', 'Danana', 96632, 11
insert into #blacklist (id_1, id_2)
select 21321, 75521
union all
select 96632, 42213
union all
select 15526, 63325
select * from #duplicates
select * from #blacklist
From the above I would like to delete the following ids from the duplicates table:
3 & 9, 8 & 11
I would not expect 2 & 10 to be deleted because they don't share the same id_master, meaning that they are not a part of a group in the duplicates table
I have puzzled over this and all I have managed to do so far is remove records if both pairs in the blacklist table EXIST in the duplicates table. I am not able to verify that they are both in a group of duplicates.
I hope someone can educate and assist!
Thanks,
Paul

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Two different solutions.

-- query 1

SELECT o.id, o.first, o.last, o.person_id, o.id_master

FROM (

SELECT d.id, d.first, d.last, d.person_id, d.id_master,

x = COUNT(*) OVER (PARTITION BY d.person_id)

FROM #duplicates d

INNER JOIN #blacklist b

ON b.id_1 = d.Person_id OR b.id_2 = d.Person_id

) o

WHERE x = 1

-- query 2

SELECT a.*

FROM #duplicates a

INNER JOIN (

SELECT id1 = d1.id, id2 = d2.id

FROM #blacklist b

LEFT JOIN #duplicates d1 ON d1.Person_id = b.id_1

LEFT JOIN #duplicates d2 ON d2.Person_id = b.id_2

WHERE d1.Person_id <> d2.Person_id

) d ON d.id1 = a.id OR d.id2 = a.id

The results of either could be used in a DELETE FROM.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 2

Thanks very much for your assistance, Chris. That works great.

I prefer to use query 2, as I can understand it. I am not too sure on how partitioning works.

I appreciate your help.

Paul

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Paul_Harvey (6/16/2010)
Thanks very much for your assistance, Chris. That works great.
I prefer to use query 2, as I can understand it. I am not too sure on how partitioning works.
I appreciate your help.
Paul

You're welcome, Paul, thanks for the feedback.

PARTITION BY is explained pretty well in BOL. A really cool use is

COUNT(columnname or *) OVER (PARTITION BY columnname), which gives you the number of rows in a partition (a range of equal values in a column) without rolling them up - the same value appears in each row of a partition. SUM(columnname) works too.

SELECT PartitionElements = COUNT(*) OVER (PARTITION BY [last]),

UselessValue = SUM(id) OVER (PARTITION BY [last]),

id, [first], [last], person_id, id_master

FROM #duplicates

ORDER BY [Last], [first]

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul_Harvey Hall of Fame Points: 3822 More actions · Answer 4

Paul_Harvey

Hall of Fame

Points: 3822

June 16, 2010 at 2:30 am

#1180813

I'll check it out.

Thanks again.

🙂