How to approach deleting duplicate data

  • I have a student_test table where some students have two rows identical with the exception of ONE field (one row the field is populated and the next, it's NULL). How do I approach deleting just the row where the field in question is NULL?

    I've successful dealt with situation where the entire row was duplicated, just not a case where a single field was different.

    Thanks

  • I think we need a little more information:

    -- what are column names?

    -- is there some sort of ID column?

    -- what data type is the field that can be null?

    -- are there students who have only ONE record but have a null value in that particular field?

    -- are there students with two records where NEITHER has a null value in that particular field?

    Rob Schripsema
    Propack, Inc.

  • -- is there some sort of ID column? --> No

    -- what data type is the field that can be null? --> NVARCHAR

    -- are there students who have only ONE record but have a null value in that particular field? --> Yes

    -- are there students with two records where NEITHER has a null value in that particular field? --> No

    Table:

    CREATE TABLE [dbo].[student_test](

    [student_test_id] [int] NOT NULL,

    [test_prog_id] [int] NOT NULL,

    [student_id] [decimal](10, 0) NOT NULL,

    [district_id] [decimal](9, 0) NOT NULL,

    [school_id] [decimal](9, 0) NOT NULL,

    [entry_date] [datetime] NOT NULL,

    [school_number] [nvarchar](3) NOT NULL,

    [school_year] [nvarchar](4) NULL,

    [test_language] [nvarchar](1) NULL,

    [test_participation] [nvarchar](2) NULL,

    [test_non_participation] [nvarchar](2) NULL,

    [test_overall_score] [decimal](4, 1) NULL,

    [proficiency] [int] NULL,

    [scaled_score] [int] NULL,

    [test_medium] [nvarchar](2) NULL,

    [warehouse_entry_date] [datetime] NULL,

    [validated_proficiency] [int] NULL,

    [validated_test_overall_score] [decimal](4, 1) NULL,

    [include_in_ayp] [nvarchar](1) NULL,

    [include_in_upass_summary] [nvarchar](1) NULL,

    [include_in_profile_summary] [nvarchar](1) NULL,

    [filename] [nvarchar](50) NULL,

    [line_number] [decimal](10, 0) NULL,

    [class_identifier] [nvarchar](7) NULL,

    [test_year] [nvarchar](4) NULL,

    [upass_proficiency] [nvarchar](2) NULL,

    [value_table_score] [decimal](4, 1) NULL,

    [incl_in_upass_progress] [nvarchar](1) NULL,

    [include_in_upass] [nvarchar](1) NULL,

    [first_time_test_taken] [nvarchar](1) NULL,

    [prev_yr_test_flag] [nvarchar](1) NULL,

    [include_in_ayp_proficiency] [nvarchar](1) NULL,

    [include_in_ayp_participation] [nvarchar](1) NULL,

    [teacher_id] [decimal](9, 0) NULL,

    [teacher_id_match_ind] [bit] NULL,

    [section_number] [nvarchar](2) NULL,

    [teacher_id_section_9_2] [nvarchar](11) NULL

    ) ON [PRIMARY]

    GO

    I have a number of rows where the rows are identical except for the field, [teacher_id_section_9_2] (the last field). One row will have a value and the other will contain a NULL. I need to delete the row with the NULL.

    Hope this is clearer.

  • Opus (11/3/2010)


    -- is there some sort of ID column? --> No

    This seems to be the beginning of the problem. Is there no primary key for this table?

    Try something like this:

    delete from student

    where [teacher_id_section_9_2] is null

    and [PrimaryKey] in (select [PrimaryKey] from student where [teacher_id_section_9_2] is not null)

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This should work, but I'd recommend adding a Primary Key constraint on your table and correcting whatever INSERT process is creating the dupes.

    DELETEst

    FROMstudent_test st

    INNER JOIN(

    SELECTstudent_test_id

    FROMstudent_test

    GROUP BYstudent_test_id

    HAVINGCOUNT(*) > 1

    ) t2 ON st.student_test_id = t2.student_test_id

    WHEREteacher_id_section_9_2 IS NULL

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Sean, John - It looks like I can more problems than just one field. I'm going back to the import but set the key fields first. Inheriting databases are a nightmare.

    Thanks for you time.

  • I have often found that drinking helps in these situations. 😛 Good luck.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean, your my kind of coder ...

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply