Finding Dupes based on previous weeks input.

  • I am currently looking at identifying duplicate names and looking for behavior patters. A duplicate in this case is very high level and there is risk involved as you will see.

    Definition of a duplicate: The same first name, surname and date of birth. Obviously there are genuine different people that hit this criteria (this dataset is approx. 3 million records) but it does not matter at the moment.

    I am looking at a data set for the previous week and running a query to identify new people that have been entered into the database. I use a sub query to get this and there are about 757 from last week.

    I need to create a SQL script that can take there 757 and cross reference / compare it to the same table to see if any duplicates come up that match the criteria of same first name, surname and data of birth.

    I have tried to give it ago but it seems to just show the same records side by side so need to exclude people with matching ID's as this would be pointless.

    SELECT O.ow_id,

    O.ow_forenames,

    O.ow_surname,

    O.ow_birthdate,

    O2.dp_person,

    O2.ow_forenames,

    O2.ow_surname,

    O2.ow_birthdate

    FROM props.dp_owner O

    join (SELECT E.dp_person,

    O3.ow_forenames,

    O3.ow_surname,

    O3.ow_birthdate

    FROM props.dp_entry E

    join props.dp_owner O3

    ON E.dp_person = O3.ow_id

    WHERE E.dp_date_entered BETWEEN To_date('17-OCT-16') AND To_date(

    '23-OCT-16')

    AND E.dp_person IS NOT NULL

    GROUP BY E.dp_person,

    O3.ow_forenames,

    O3.ow_surname,

    O3.ow_birthdate) O2

    ON ( O.ow_forenames = O2.ow_forenames

    AND O.ow_surname = O2.ow_surname

    AND O.ow_birthdate = O2.ow_birthdate )

    Thanks in advance,

    Paul

  • Is this Oracle? :crazy:

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • dramaqueen (10/26/2016)


    I am currently looking at identifying duplicate names and looking for behavior patters. A duplicate in this case is very high level and there is risk involved as you will see.

    Definition of a duplicate: The same first name, surname and date of birth. Obviously there are genuine different people that hit this criteria (this dataset is approx. 3 million records) but it does not matter at the moment.

    I am looking at a data set for the previous week and running a query to identify new people that have been entered into the database. I use a sub query to get this and there are about 757 from last week.

    I need to create a SQL script that can take there 757 and cross reference / compare it to the same table to see if any duplicates come up that match the criteria of same first name, surname and data of birth.

    I have tried to give it ago but it seems to just show the same records side by side so need to exclude people with matching ID's as this would be pointless.

    SELECT O.ow_id,

    O.ow_forenames,

    O.ow_surname,

    O.ow_birthdate,

    O2.dp_person,

    O2.ow_forenames,

    O2.ow_surname,

    O2.ow_birthdate

    FROM props.dp_owner O

    join (SELECT E.dp_person,

    O3.ow_forenames,

    O3.ow_surname,

    O3.ow_birthdate

    FROM props.dp_entry E

    join props.dp_owner O3

    ON E.dp_person = O3.ow_id

    WHERE E.dp_date_entered BETWEEN To_date('17-OCT-16') AND To_date(

    '23-OCT-16')

    AND E.dp_person IS NOT NULL

    GROUP BY E.dp_person,

    O3.ow_forenames,

    O3.ow_surname,

    O3.ow_birthdate) O2

    ON ( O.ow_forenames = O2.ow_forenames

    AND O.ow_surname = O2.ow_surname

    AND O.ow_birthdate = O2.ow_birthdate )

    Thanks in advance,

    Paul

    So ... add a line to the end of the query to remove those:

    WHERE o.ow_id <> o2.ow_id

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Perfect thanks. It worked and was really simple.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply