February 25, 2005 at 7:27 am
Hi all,
I'm having trouble with a query in my database which is driving me up the wall!
Im trying to create a query which removes the duplicates where necessary....
There are duplicate people in the database but with different person_id's and one record has the UPN against it.
Scenarios (THESE ARE SIMILAR!)....
SCENARIO 1.
---------------------------
Person_table
Person_id Surname Forename Dob Role Latest_Change_Date
1 Jones John 01/01/1980 1 (student) 01/01/2005 12:00
2 Jones John 01/01/1980 128 (applicant) 01/01/2005 12:00
Stud_student_table
Person_id Upn
1 NULL
Applicant_table
Person_id upn
2 111111
REQUIRED SOLUTION
I need to find the matching person using Surname, Forename and DOB if there is one, and if it's an applicant I need to remove
the UPN from the Applicant_table. This is the same person just duplicated in the database. In this case I don't care when the record
was last changed.
SCENARIO 2.
---------------------------
Person_table
Person_id Surname Forename Dob Role Latest_Change_Date
3 Smith Joe 01/01/1981 1 (student) 01/01/2005 12:00
4 Smith Joe 01/01/1981 1 (student) 12/12/2001 12:00
Stud_student_table
Person_id Upn
3 NULL
4 222222 -- NOTE: EARLIEST RECORD HAS THE UPN, NEEDS TO BE REMOVED.
REQUIRED SOLUTION
Again, I need to find the matching person using Surname, Forename and DOB if there is one, then I need to remove the the earliest
changed record (in this case it is person_id = 4)
SCENARIO 3.
---------------------------
Person_table
Person_id Surname Forename Dob Role Latest_Change_Date
5 Bloggs Fred 01/01/1981 1 (student) 01/01/2005 12:00
6 Bloggs Fred 01/01/1981 129 (student AND Applicant) 12/12/2001 12:00
Stud_student_table
Person_id Upn
5 NULL
6 222222
Aplicant_Table
Person_id Upn
6 222222 -- NOTE: EARLIEST RECORD HAS THE UPN, NEEDS TO BE REMOVED.
REQUIRED SOLUTION
Again, I need to find the matching person using Surname, Forename and DOB if there is one, then I need to remove the the earliest
changed record (in this case it is person_id = 6) from both applicant and person.
Thanks in advance! Tim
I've been trying something like below... but obviously it isn't working yet (may be of some use).
(
--upn_id int identity not null,
upn varchar(20) null,
table_name sysname null,
person_id int,
surname varchar(50) null,
forename varchar(50) null,
duplicate bit not null default 0
)
select a.upn, a.person_id, p.surname, p.forename from sims.ra_applicant a
join sims.sims_person p
on a.person_id = p.person_id
where upn is not null
-------------------------------------------------------------------------
-- Attempt to find all duplicate people
declare @Duplicate_People table
(
forename varchar(200) null,
gender varchar(200) null,
surname varchar(200) null,
DOB datetime null,
Number_of_Duplicates int
)
insert into @Duplicate_People
( forename, gender, surname, Number_of_Duplicates, DOB )
select p.forename, p.gender, p.surname, count(p.surname + p.forename + p.gender + cast(p.dob as varchar)) as person_counter, p.dob from sims.sims_person p
inner join sims.stud_student s
on p.person_id = s.person_id
group by p.forename, p.gender, p.surname, p.dob --, (p.surname + p.forename + p.gender)
having count(p.surname + p.forename + p.gender + cast(p.dob as varchar) ) > 1
inner join sims.sims_person p on
p.forename = dp.forename and p.gender = dp.gender and p.surname = dp.surname
-- Now remove the duplicate persons UPN
( person_id int,
aggregate_role int,
last_updated datetime null,
forename varchar(200) null,
gender varchar(200) null,
surname varchar(200) null,
DOB datetime null )
select p.forename, p.gender, p.surname, p.dob, p.person_id, p.aggregate_role, s.last_updated from @Duplicate_People dp
inner join sims.sims_person p on
p.forename = dp.forename and p.gender = dp.gender and p.surname = dp.surname
left outer join sims.stud_student s
on s.person_id = p.person_id
from @Duplicate_People_To_Remove
group by forename, surname, dob, gender
from sims.stud_student s
inner join @Duplicate_People dp
on dp.person_id = s.person_id
February 25, 2005 at 11:43 am
Tim,
This is a divide and conquer case. As a guidance you can take a look at the following code
Create table #TDups( Person_Id int,
Role varchar(50),
Dte datetime,
Surname varchar(50),
Forename varchar(50),
Dob datetime)
Insert into #TDups( Person_Id, Role , Dte, Surname, Forename, Dob)
Select P.Person_ID, P.Role, P.Dte, P.Surname, P.Forename, P.Dob
From
Person_table P
join
(select Surname, Forename, Dob
From Person_table
Group by Surname, Forename, Dob
Having Count(*) > 1 ) Dups
on
P.Surname = Dups.Surname
And P.Forename = Dups.Forename
And P.Dob = Dups.Dob
-- First Case
Delete From a
FROM Applicant_table a
join #TDups d on d.Person_Id = a.Person_ID and d.Role = '(applicant)'
Delete From p
From Person_table p
join #TDups d on d.Person_Id = p.Person_ID and d.Role = '(applicant)'
-- Second Case
Delete From s
From Student_table s join #TDups p on s.Person_ID = p.Person_ID
where exists(select * from #TDups d on d.Surname = p.Surname
and p.Forename = d.Forename
and p.Dob = d.Dob and d.Role = '(student)' and p.role = '(student)'
and p.Dte > d.Dte )
Delete From p
From Person_table p
where exists(select * from #TDups d on d.Surname = p.Surname
and p.Forename = d.Forename
and p.Dob = d.Dob and d.Role = '(student)' and p.role = '(student)'
and p.Dte > d.Dte )
-- third Case
Delete From a
From Applicant_table a join #TDups p on a.Person_ID = p.Person_ID
where exists(select * from #TDups d on d.Surname = p.Surname
and p.Forename = d.Forename
and p.Dob = d.Dob and d.Role = '(student And applicant)' and p.role = '(student)'
and p.Dte > d.Dte )
Delete From p
From Person_table p
where exists(select * from #TDups d on d.Surname = p.Surname
and p.Forename = d.Forename
and p.Dob = d.Dob and d.Role = '(student And applicant)' and p.role = '(student)'
and p.Dte > d.Dte )
Of course you have to make sure to handle everything is wrapped in tranaction handling code and that it actually perform what you need.
Good Luck
HTH
* Noel
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply