Finding and SHOWING duplicates

  • I'm working to find a way to verify in my master student table that each student in a table has only one student ID number.  My table has studentnumber, which could be in the table more than once if the student is enrolled at more than one school ... so I'll need (I'm thinking) to use their lastname, firstname and birthdate fields to identify the duplicates, and then identify among the duplicates found within that set instances were the identical records (based on lastname, firstname, birthdate) have differing values in the student ID number.

    The overall goal is to identify a student who managed to get two or more student ID numbers assigned to his or her name.  I've tried adapting the ACCESS find duplicates query, which will find the duplicates based on lastname, firstname and birthdate, but what it won't do is identify the cases among this group of duplicates where a set of matching records has a different student ID.  

    Here's some sample data...   The records I need to identify are the first two, where the student ID is different, but the FN,LN and BD are the same.

    Student IDFirstNameLastNameBirthdate
    1221144Joe Blow2211/25/1990
    1221145Joe Blow2211/25/1990
    1221146JohnBrown2/2/1988
    1221147SeanWest2/1/1989
    1221148EliahWilliams4/5/1999

    A brain buster for me... I'm hoping you have a thought! 

    Thanks in advance for your help!

  • Try this. I have assumed that birthdate is stored as datetime. If it is a varchar, then be careful to check for differences in regional dates ( 1/2/1989 and 2/1/1989 )

    select *

    from student

    where firstname + ' ' + lastname + ' - ' + convert(varchar(10), birthdate, 103) in

     (select firstname + ' ' + lastname + ' - ' + convert(varchar(10), birthdate, 103)

     from student

     group by firstname, lastname, dob

     having count(firstname + ' ' + lastname + ' - ' + convert(varchar(10), birthdate, 103)) > 1)

  • I think dc's approach is the best one.

  • "best one" - from all these numerous posts that're vying with each other to provide the solution..?!?!







    **ASCII stupid question, get a stupid ANSI !!!**

  • I wouldn't do a string cocatenation for this. This will work for this as well.

    SET NOCOUNT ON

    DECLARE @Student TABLE

    (

    StudentID CHAR(7),

    FirstName VARCHAR(50),

    LastName VARCHAR(50),

    BirthDate DATETIME

    )

    INSERT @Student

    SELECT '1221144', 'Joe', 'Blow22', '11/25/1990' UNION

    SELECT '1221145', 'Joe', 'Blow22', '11/25/1990' UNION

    SELECT '1221146', 'John', 'Brown', '2/2/1988' UNION

    SELECT '1221147', 'Sean', 'West', '2/1/1989' UNION

    SELECT '1221148', 'Eliah', 'Williams', '4/5/1999'

    SELECT *

    FROM

     @Student A

    JOIN

    (

     SELECT FirstName, LastName, BirthDate

     FROM

      @Student

     GROUP BY FirstName, LastName, BirthDate

     HAVING COUNT(StudentID) > 1) B

    ON

     A.FirstName = B.FirstName

    AND A.LastName = B.LastName

    AND A.BirthDate = B.BirthDate

    Regards,
    gova

  • Very nice govinn - "I think govinn's approach is the best one." - since there's no concatenation involved.

    Hey - the weekend bug's gotten me...







    **ASCII stupid question, get a stupid ANSI !!!**

  • Can we say one of the best.

    Regards,
    gova

  • Nope - since there're only 2 responses - okay - I'll compromise & rephrase...yours is the better one since it avoids concatenation...







    **ASCII stupid question, get a stupid ANSI !!!**

  • The result this one produces is the same as I got with the derive Access find duplicates code...... the problem being, it is finding the duplicates based upon same FN,LN,BD,  but it's not limiting the result to those among this group who have two or more different student iD numbers, which is what I'm after... ...    What I'm seeing in the result are all instances of this student record in the table with multiple instances of the same student ID number included in the returned set. 

    Thanks!    

  • Thanks the the reply..  

    This query is returning all duplicates based on the FN,LN,Bd whether or not the records returned have the same PERMNUM (student ID) ...  the same as I get with my query... but what I need is only the group of records with duplicate FN,LN,BD with different Student ID numbers... so I can find those students who managed to get two different ID numbers...

     

    Thanks!

  • Replace table/column names with the appropriate names for your table.

    Select Distinct A.*

    From YourTable As A

    Where Exists (

      Select *

      From YourTable As B

      Where A.FN = B.FN

      And   A.LN = B.FN

      And   A.BD = B.BD

      And   A.StudentID <> B.StudentID

    )

  • Thanks for your (and everyone else here's) expertise and kindness in sharing your knowledge.  I am in your debt.

     

     

     

  • I was a little puzzled why Govinn's query didn't work. Then I realised that this could be because your table has more columns than you have stated above - a unique id, or something else that distinguishes records with the same FirstName, LastName, BirthDate and StudentID. Try this modification:

    SELECT *

    FROM

     @Student A

    JOIN

    (

     SELECT FirstName, LastName, BirthDate

     FROM

      @Student

     GROUP BY FirstName, LastName, BirthDate

    HAVING COUNT(distinct StudentID) > 1) B

    ON

     A.FirstName = B.FirstName

    AND A.LastName = B.LastName

    AND A.BirthDate = B.BirthDate

     

    You might want to run only a part of this query, since it is probably the easiest way to identify duplicates - although it is not quite what you have been asking for:

    SELECT FirstName, LastName, BirthDate

     FROM

      @Student

     GROUP BY FirstName, LastName, BirthDate

    HAVING COUNT(distinct StudentID) > 1

     

  • Thanks.. there are more colums, but none with unique features.   Your second query works, but won't fill my requirement due to their being duplicate entries in the table for instances where a student is enrolled in more than one site, either concurrently or due to a mid-year transfer.   

    My other question regarding your top query here,  

    SELECT *

    FROM

     @Student A

    JOIN

    (

     SELECT FirstName, LastName, BirthDate

     FROM

      @Student

     GROUP BY FirstName, LastName, BirthDate

    HAVING COUNT(distinct StudentID) > 1) B

    ON

     A.FirstName = B.FirstName

    AND A.LastName = B.LastName

    AND A.BirthDate = B.BirthDate

    In the "Select From @Student A" part ...are you building a temporary duplicate table of my original table, and doing joins between them to establish the links and counts here.. am i correct in thinking that?   What is the "@student" syntax 's purpose here?

     

    Thanks! 

  • The @Student table is from govinn's post. It is a table variable and not a "real" table in order to avoid changing the table structure of the database when testing the query (I guess). You should replace @Student by the name of your table when you're testing if the query meets your requirements.

     

     

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply