How to approach deleting duplicate data

Question

Post reply

How to approach deleting duplicate data

Opus

Default port

Points: 1419
More actions
November 3, 2010 at 8:36 am

#228716

I have a student_test table where some students have two rows identical with the exception of ONE field (one row the field is populated and the next, it's NULL). How do I approach deleting just the row where the field in question is NULL?
I've successful dealt with situation where the entire row was duplicated, just not a case where a single field was different.
Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Rob Schripsema SSCertifiable Points: 7469 More actions · Answer 1

I think we need a little more information:

-- what are column names?

-- is there some sort of ID column?

-- what data type is the field that can be null?

-- are there students who have only ONE record but have a null value in that particular field?

-- are there students with two records where NEITHER has a null value in that particular field?

Rob Schripsema
Propack, Inc.

Opus Default port Points: 1419 More actions · Answer 2

-- is there some sort of ID column? --> No

-- what data type is the field that can be null? --> NVARCHAR

-- are there students who have only ONE record but have a null value in that particular field? --> Yes

-- are there students with two records where NEITHER has a null value in that particular field? --> No

Table:

CREATE TABLE [dbo].[student_test](

[student_test_id] [int] NOT NULL,

[test_prog_id] [int] NOT NULL,

[student_id] [decimal](10, 0) NOT NULL,

[district_id] [decimal](9, 0) NOT NULL,

[school_id] [decimal](9, 0) NOT NULL,

[entry_date] [datetime] NOT NULL,

[school_number] [nvarchar](3) NOT NULL,

[school_year] [nvarchar](4) NULL,

[test_language] [nvarchar](1) NULL,

[test_participation] [nvarchar](2) NULL,

[test_non_participation] [nvarchar](2) NULL,

[test_overall_score] [decimal](4, 1) NULL,

[proficiency] [int] NULL,

[scaled_score] [int] NULL,

[test_medium] [nvarchar](2) NULL,

[warehouse_entry_date] [datetime] NULL,

[validated_proficiency] [int] NULL,

[validated_test_overall_score] [decimal](4, 1) NULL,

[include_in_ayp] [nvarchar](1) NULL,

[include_in_upass_summary] [nvarchar](1) NULL,

[include_in_profile_summary] [nvarchar](1) NULL,

[filename] [nvarchar](50) NULL,

[line_number] [decimal](10, 0) NULL,

[class_identifier] [nvarchar](7) NULL,

[test_year] [nvarchar](4) NULL,

[upass_proficiency] [nvarchar](2) NULL,

[value_table_score] [decimal](4, 1) NULL,

[incl_in_upass_progress] [nvarchar](1) NULL,

[include_in_upass] [nvarchar](1) NULL,

[first_time_test_taken] [nvarchar](1) NULL,

[prev_yr_test_flag] [nvarchar](1) NULL,

[include_in_ayp_proficiency] [nvarchar](1) NULL,

[include_in_ayp_participation] [nvarchar](1) NULL,

[teacher_id] [decimal](9, 0) NULL,

[teacher_id_match_ind] [bit] NULL,

[section_number] [nvarchar](2) NULL,

[teacher_id_section_9_2] [nvarchar](11) NULL

) ON [PRIMARY]

GO

I have a number of rows where the rows are identical except for the field, [teacher_id_section_9_2] (the last field). One row will have a value and the other will contain a NULL. I need to delete the row with the NULL.

Hope this is clearer.

Sean Lange SSC Guru Points: 286590 More actions · Answer 3

Opus (11/3/2010)
-- is there some sort of ID column? --> No

This seems to be the beginning of the problem. Is there no primary key for this table?

Try something like this:

delete from student

where [teacher_id_section_9_2] is null

and [PrimaryKey] in (select [PrimaryKey] from student where [teacher_id_section_9_2] is not null)

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

John Rowan SSC Guru Points: 56440 More actions · Answer 4

This should work, but I'd recommend adding a Primary Key constraint on your table and correcting whatever INSERT process is creating the dupes.

DELETEst

FROMstudent_test st

INNER JOIN(

SELECTstudent_test_id

FROMstudent_test

GROUP BYstudent_test_id

HAVINGCOUNT(*) > 1

) t2 ON st.student_test_id = t2.student_test_id

WHEREteacher_id_section_9_2 IS NULL

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Opus Default port Points: 1419 More actions · Answer 5

Sean, John - It looks like I can more problems than just one field. I'm going back to the import but set the key fields first. Inheriting databases are a nightmare.

Thanks for you time.

Sean Lange SSC Guru Points: 286590 More actions · Answer 6

I have often found that drinking helps in these situations. 😛 Good luck.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Opus Default port Points: 1419 More actions · Answer 7

Opus

Default port

Points: 1419

November 3, 2010 at 1:58 pm

#1246272

Sean, your my kind of coder ...