Delete duplicate records

Question

Delete duplicate records

adonetok

SSCrazy Eights

Points: 8395
More actions
September 23, 2011 at 11:56 am

#245398

I need to determine and then delete duplicate records in one table (Order) in which there are about 50 columns.
Only OrderBy, OrderDate and OrderCity are the same will be deleted (but only keep one record).
How to code it?
For example, the following records 3), 6) need to be deleted.
OrderBy OrderDate OrderCity
1) A 02/22/2011 NEW YORK
2) A 02/22/2011 LONDON
3) A 02/22/2011 NEW YORK
4) B 02/22/2011 NEW YORK
5) A 05/05/2011 NEW YORK
6) A 02/22/2011 NEW YORK

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 736210 More actions · Answer 1

How do you determine which one to keep? note that SQL doesn't order records, so the order you see from a SELECT ORDRBY FROM MYTABLE isn't necessarily what you'll always get.

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

And more importantly do you have a primary key on this table? This is more challenging if you don't.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

SQLRNNR SSC Guru Points: 281334 More actions · Answer 3

I dare say that in this case you are probably missing part of the reason for apparent dupes. Do you have an OrderDetail table? Or does the Order table contain all of the pertinent order item detail?

Each of your "duplicate" entries in this table may not really be duplicates because you have not identified that each record may refer to a different piece of that order. Also, does this table account for refunds or postponed fulfillment?

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

adonetok SSCrazy Eights Points: 8395 More actions · Answer 4

Yes, there is a primary key on ID column.

I want to keep min(ID)

Sean Lange SSC Guru Points: 286590 More actions · Answer 5

This will do what you are asking for:

create table #List

(

ListID int identity not null,

OrderBy char(1),

OrderDate datetime,

OrderCity varchar(20)

)

insert #List (OrderBy, OrderDate, OrderCity)

values

('A', '02/22/2011', 'NEW YORK'),

('A', '02/22/2011', 'LONDON'),

('A', '02/22/2011', 'NEW YORK'),

('B', '02/22/2011', 'NEW YORK'),

('A', '05/05/2011', 'NEW YORK'),

('A', '02/22/2011', 'NEW YORK')

delete #List where ListID not in

(

select Min(ListID) as ListIDToKeep

from #List

group by OrderBy, OrderDate, OrderCity

)

select * from #List

drop table #List

Note that I created consumable ddl and data so others can quickly and easily look at this.

I agree that this is probably not really want you to do but I will leave that decision up to you. It is pretty unusual to have true duplicates in an order table. As suggested there is probably an OrderDetail table or something that holds the line items for the order. If so, this will either fail if there is an actual foreign key on the detail table OR it will create orphaned detail records.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 6

Can you do something simple like this or am I missing the boat on this one?

CREATE TABLE #temp (OrderBy varchar(1), OrderDate datetime, OrderCity varchar(15))

GO

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','LONDON')

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('B','02/22/2011','NEW YORK')

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','05/05/2011','NEW YORK')

INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')

DELETE T FROM

(SELECT ROW_NUMBER() OVER(PARTITION BY OrderBy, OrderDate, OrderCity

ORDER BY OrderBy, OrderDate, OrderCity) As RowNumber,* FROM #Temp )

T

WHERE RowNumber > 1

SELECT * FROM #temp

DROP TABLE #temp

Output:

A2011-02-22 00:00:00.000LONDON

A2011-02-22 00:00:00.000NEW YORK

B2011-02-22 00:00:00.000NEW YORK

A2011-05-05 00:00:00.000NEW YORK

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 7

Sorry Sean, you had posted your reply while I was typing in mine, didn't mean to duplicate the effort! 😉

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Sean Lange SSC Guru Points: 286590 More actions · Answer 8

MyDoggieJessie (9/23/2011)
Sorry Sean, you had posted your reply while I was typing in mine, didn't mean to duplicate the effort! 😉

No problem. Your code accomplished the exact same thing but does it a completely different way, once again proving that there are as many ways to do something as there people that want it done. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

SQLRNNR SSC Guru Points: 281334 More actions · Answer 9

adonetok (9/23/2011)
Yes, there is a primary key on ID column.
I want to keep min(ID)

I think the best thing you could do is provide DDL for the tables involved. Otherwise, solutions provided could be far from what you truly need.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

adonetok SSCrazy Eights Points: 8395 More actions · Answer 10

adonetok

SSCrazy Eights

Points: 8395

September 23, 2011 at 12:59 pm

#1386000

Thank you for help.

I think that both ways will work.

MadAdmin SSChampion Points: 11261 More actions · Answer 11

Firstly, here is a script which should list all tables and records which are duplicated, PK is removed from this comparison.

So the script will dynamically group by all the columns excluding the pk.

Be warned, aggregating a 2 million row table with 20 columns takes loooong.

DECLARE @sql varchar(max), @table varchar(100)

DECLARE table_cur CURSOR

FOR SELECT table_name

FROM INFORMATION_SCHEMA.TABLES t

LEFT OUTER JOIN sys.columns c ON c.object_id = OBJECT_ID(t.table_name)

AND c.is_identity = 1 -- GET ALL IDENTITY COLUMNS

LEFT OUTER JOIN sys.indexes i ON i.object_id = OBJECT_ID(t.table_name)

AND i.is_unique = 1 -- GET ALL UNIQUE INDEXES

LEFT OUTER JOIN sysobjects O ON o.id = OBJECT_ID(t.table_name)

AND o.xtype = 'PK'

WHERE c.object_id IS NULL -- ie an identity column doesn't exist

AND i.object_id IS NULL -- ie a unique index doesn't exist

AND o.id IS null

AND t.table_type = 'BASE TABLE'

AND t.TABLE_name NOT LIKE '%temp%' -- These would be uninteresting.

AND t.TABLE_NAME NOT LIKE '%tmp%'

ORDER BY table_name DESC

OPEN table_cur

FETCH NEXT FROM table_cur INTO @table

WHILE @@FETCH_STATUS = 0

BEGIN

SET @sql = 'COUNT(*) '+@table

SELECT @sql = RTRIM(@sql) + ', [' + c.column_name + ']'

FROM INFORMATION_SCHEMA.COLUMNS c

WHERE table_name = +RTRIM(@table)

SET @sql = 'SELECT ' + RTRIM(@sql) + ' FROM [' + RTRIM(@table) + '] '

+'GROUP BY ' + REPLACE(@sql, 'COUNT(*) '+@table+',', '') +' HAVING COUNT(*) > 1'

PRINT @sql

EXEC ( @sql

)

FETCH NEXT FROM table_cur INTO @table

END

CLOSE table_cur

DEALLOCATE table_cur

You can probably then do something like

select into HoldingTable min(pk),col1,col2,col3,col4 from MyTable group by col1,col2,col3,col4

alter MyTable name to myTableBackup.

drop mytable

select into mytable * from HoldingTable

--------------

You might need to drop the pk before dropping the table. Here is a script which will give you a list of all the fk's originating from that pk, which need to be dropped first before dropping the pk, then recreate the table and then re enable all the constraints.

I suggest you run this and save the entire script somewhere.

I ever so strongly suggest you do this on a QA environment first and prove it to a fellow dev before playing with production data.

------------

DECLARE @PKTableName VARCHAR(100),

@PKName varchar(100),

@FKName varchar(100),

@sql varchar(max),

@PKcolumnName varchar(30),

@table VARCHAR(30),

@FKColumnName VARCHAR(100)

SET @PKTableName = 'MyTable'

set @PKName = ( SELECT name

FROM sys.indexes

WHERE OBJECT_NAME(object_id) = @PKTableName

AND is_primary_key = 1

)

set @PKcolumnName = ( SELECT name

FROM sys.columns

WHERE OBJECT_NAME(object_id) = @PKTableName

AND is_identity = 1

)

IF EXISTS ( SELECT *

FROM sys.indexes

WHERE object_ID(@PKTableName) = object_id

AND index_id = 0 ) -- IF a heap

begin

IF EXISTS ( SELECT *

FROM sys.tables

WHERE object_id = OBJECT_ID(N'[dbo].[FKAgainstTableList]') )

BEGIN

DROP TABLE FKAgainstTableList

END

SELECT OBJECT_NAME(sys.foreign_key_columns.parent_object_id) [Table],

sys.columns.name [FKColumnName],

sys.foreign_keys.name [FKName]

INTO FKAgainstTableList

FROM sys.foreign_keys

INNER JOIN sys.foreign_key_columns ON sys.foreign_keys.object_id = sys.foreign_key_columns.constraint_object_id

INNER JOIN sys.columns ON sys.columns.object_id = sys.foreign_keys.parent_object_id

AND sys.columns.column_id = sys.foreign_key_columns.parent_column_id

WHERE OBJECT_NAME(sys.foreign_keys.referenced_object_id) = @PKTableName

DECLARE table_cur1 CURSOR

FOR SELECT *

FROM FKAgainstTableList

-------------------------------Disable constraint on FK Tables

OPEN table_cur1

FETCH NEXT FROM table_cur1 INTO @table, @FKColumnName, @FKName

WHILE@@FETCH_STATUS = 0

BEGIN

SET @sql = 'ALTER TABLE ' + @table + ' DROP CONSTRAINT '

+ @FKName

PRINT @sql

FETCH NEXT FROM table_cur1 INTO @table, @FKColumnName, @FKName

END

CLOSE table_cur1

DEALLOCATE table_cur1

--------------------------------DROP AND recreate CLUSTERED pk

IF EXISTS ( SELECT 1

FROM sys.indexes

WHERE object_id = OBJECT_ID(@PKTableName)

AND name = @PKName )

BEGIN

SET @sql = 'ALTER TABLE ' + @PKTableName + ' DROP CONSTRAINT '

+ @PKName

PRINT @sql

END

SET @sql = 'ALTER TABLE ' + @PKTableName + ' ADD CONSTRAINT '

+ @PKName + ' PRIMARY KEY CLUSTERED (' + @PKcolumnName

+ ' ASC)

WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]'

PRINT ( @sql )

--------------------------------Enable FK constraints on FK tables.

DECLARE table_cur2 CURSOR

FOR SELECT *

FROM FKAgainstTableList

OPEN table_cur2

FETCH NEXT FROM table_cur2 INTO @table, @FKColumnName, @FKName

WHILE@@FETCH_STATUS = 0

BEGIN

SET @sql = 'ALTER TABLE ' + @table

+ ' WITH NOCHECK ADD CONSTRAINT ' + @FKName

+ ' FOREIGN KEY([' + @FKColumnName + '])

REFERENCES [' + @PKTableName + '] ([' + @PKcolumnName + '])'

PRINT ( @sql )

SET @sql = 'ALTER TABLE ' + @table + ' CHECK CONSTRAINT '

+ @FKName

PRINT ( @sql )

FETCH NEXT FROM table_cur2 INTO @table, @FKColumnName, @FKName

END

CLOSE table_cur2

DEALLOCATE table_cur2

DROP TABLE FKAgainstTableList

end

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]