Getting stuck and feeling useless! please help!

  • Hi all,

    I'm having trouble with a query in my database which is driving me up the wall!

    Im trying to create a query which removes the duplicates where necessary....

    There are duplicate people in the database but with different person_id's and one record has the UPN against it.

    Scenarios (THESE ARE SIMILAR!)....

    SCENARIO 1.

    ---------------------------

    Person_table

    Person_id Surname Forename Dob Role Latest_Change_Date

    1 Jones John 01/01/1980 1 (student) 01/01/2005 12:00

    2 Jones John 01/01/1980 128 (applicant) 01/01/2005 12:00

    Stud_student_table

    Person_id Upn

    1 NULL

    Applicant_table

    Person_id upn

    2 111111

    REQUIRED SOLUTION

    I need to find the matching person using Surname, Forename and DOB if there is one, and if it's an applicant I need to remove

    the UPN from the Applicant_table. This is the same person just duplicated in the database. In this case I don't care when the record

    was last changed.

    SCENARIO 2.

    ---------------------------

    Person_table

    Person_id Surname Forename Dob Role Latest_Change_Date

    3 Smith Joe 01/01/1981 1 (student) 01/01/2005 12:00

    4 Smith Joe 01/01/1981 1 (student) 12/12/2001 12:00

    Stud_student_table

    Person_id Upn

    3 NULL

    4 222222 -- NOTE: EARLIEST RECORD HAS THE UPN, NEEDS TO BE REMOVED.

    REQUIRED SOLUTION

    Again, I need to find the matching person using Surname, Forename and DOB if there is one, then I need to remove the the earliest

    changed record (in this case it is person_id = 4)

    SCENARIO 3.

    ---------------------------

    Person_table

    Person_id Surname Forename Dob Role Latest_Change_Date

    5 Bloggs Fred 01/01/1981 1 (student) 01/01/2005 12:00

    6 Bloggs Fred 01/01/1981 129 (student AND Applicant) 12/12/2001 12:00

    Stud_student_table

    Person_id Upn

    5 NULL

    6 222222

    Aplicant_Table

    Person_id Upn

    6 222222 -- NOTE: EARLIEST RECORD HAS THE UPN, NEEDS TO BE REMOVED.

    REQUIRED SOLUTION

    Again, I need to find the matching person using Surname, Forename and DOB if there is one, then I need to remove the the earliest

    changed record (in this case it is person_id = 6) from both applicant and person.

    Thanks in advance! Tim 

     

    I've been trying something like below... but obviously it isn't working yet (may be of some use).

    ---------------------------------------------------------------------
    declare @Duplicate_upns table

    (

       --upn_id int identity not null,

       upn varchar(20) null,

       table_name sysname null,

       person_id int,

       surname varchar(50) null,

       forename varchar(50) null,

       duplicate bit not null default 0

    )

     
    insert into @Duplicate_upns (upn, person_id, surname, forename)

       select a.upn, a.person_id, p.surname, p.forename from sims.ra_applicant a

     join sims.sims_person p

      on a.person_id = p.person_id 

     where upn is not null

    -------------------------------------------------------------------------

    -- Attempt to find all duplicate people

     
    -- store the duplicate people here...

    declare @Duplicate_People table

    (

       forename varchar(200) null,

       gender varchar(200) null,

       surname varchar(200) null,

       DOB datetime null,

       Number_of_Duplicates int

    )

     
    -- This gets people with duplicate details

    insert into @Duplicate_People

    (  forename,  gender,   surname,    Number_of_Duplicates,           DOB )

    select  p.forename, p.gender,   p.surname,    count(p.surname + p.forename + p.gender + cast(p.dob as varchar)) as person_counter, p.dob from sims.sims_person p

     inner join sims.stud_student s

     on p.person_id = s.person_id

     group by p.forename, p.gender, p.surname, p.dob --, (p.surname + p.forename + p.gender)

     having count(p.surname + p.forename + p.gender + cast(p.dob as varchar) ) > 1

     
    select * from @Duplicate_People dp

     inner join sims.sims_person p on

     p.forename = dp.forename and p.gender = dp.gender and p.surname = dp.surname

    -- Now remove the duplicate persons UPN

     
    declare @Duplicate_People_To_Remove table

    (  person_id int,

       aggregate_role int,

       last_updated datetime null,

       forename varchar(200) null,

       gender varchar(200) null,

       surname varchar(200) null,

       DOB datetime null    )      

     
    insert into @Duplicate_People_To_Remove (forename,  gender,   surname, dob, person_id, aggregate_role, last_updated)

    select p.forename, p.gender,   p.surname, p.dob, p.person_id, p.aggregate_role, s.last_updated from @Duplicate_People dp

     inner join sims.sims_person p on

     p.forename = dp.forename and p.gender = dp.gender and p.surname = dp.surname

     left outer join sims.stud_student s

      on s.person_id = p.person_id

     
    select * from @Duplicate_People_To_Remove
     
    select max(last_updated) as max_last_updated, forename, surname, dob, gender

     from @Duplicate_People_To_Remove

     group by forename, surname, dob, gender

     
     
     
    -- Need to pick up the highest last_updated ??
     

    update sims.stud_student set unique_pupil_no = null

    from sims.stud_student s

    inner join @Duplicate_People dp

    on dp.person_id = s.person_id

  • Tim,

    This is a divide and conquer case. As a guidance you can take a look at the following code

     

    Create table #TDups( Person_Id int,

           Role varchar(50),

                         Dte datetime,

                         Surname varchar(50),

                         Forename varchar(50),

                         Dob datetime)

    Insert into #TDups( Person_Id, Role , Dte, Surname, Forename, Dob)

    Select P.Person_ID, P.Role, P.Dte, P.Surname, P.Forename, P.Dob

    From

     Person_table P

    join

     (select Surname, Forename, Dob

     From Person_table

     Group by Surname, Forename, Dob

     Having Count(*) > 1 ) Dups 

    on

     P.Surname = Dups.Surname

     And P.Forename = Dups.Forename

     And P.Dob = Dups.Dob

    -- First Case

     Delete From a

     FROM Applicant_table a

     join #TDups d on d.Person_Id = a.Person_ID and d.Role = '(applicant)'

     Delete From p

     From Person_table p

     join #TDups d on d.Person_Id = p.Person_ID and d.Role = '(applicant)'

    -- Second Case

     Delete From s

     From Student_table s join #TDups p on s.Person_ID = p.Person_ID

     where exists(select * from #TDups d on d.Surname = p.Surname

             and p.Forename = d.Forename

      and p.Dob = d.Dob and d.Role = '(student)' and p.role = '(student)'

        and p.Dte > d.Dte )

     Delete From p

     From Person_table p

     where exists(select * from #TDups d on d.Surname = p.Surname

             and p.Forename = d.Forename

      and p.Dob = d.Dob and d.Role = '(student)' and p.role = '(student)'

        and p.Dte > d.Dte )

    -- third Case

     Delete From a

     From Applicant_table a join #TDups p on a.Person_ID = p.Person_ID

     where exists(select * from #TDups d on d.Surname = p.Surname

             and p.Forename = d.Forename

      and p.Dob = d.Dob and d.Role = '(student And applicant)' and p.role = '(student)'

        and p.Dte > d.Dte )

     Delete From p

     From Person_table p

     where exists(select * from #TDups d on d.Surname = p.Surname

             and p.Forename = d.Forename

      and p.Dob = d.Dob and d.Role = '(student And applicant)' and p.role = '(student)'

        and p.Dte > d.Dte )

    Of course you have to make sure to handle everything is wrapped in tranaction handling code and that it actually perform what you need.

    Good Luck

    HTH

     


    * Noel

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply