September 23, 2011 at 11:56 am
I need to determine and then delete duplicate records in one table (Order) in which there are about 50 columns.
Only OrderBy, OrderDate and OrderCity are the same will be deleted (but only keep one record).
How to code it?
For example, the following records 3), 6) need to be deleted.
OrderBy OrderDate OrderCity
1) A 02/22/2011 NEW YORK
2) A 02/22/2011 LONDON
3) A 02/22/2011 NEW YORK
4) B 02/22/2011 NEW YORK
5) A 05/05/2011 NEW YORK
6) A 02/22/2011 NEW YORK
September 23, 2011 at 12:26 pm
How do you determine which one to keep? note that SQL doesn't order records, so the order you see from a SELECT ORDRBY FROM MYTABLE isn't necessarily what you'll always get.
September 23, 2011 at 12:31 pm
And more importantly do you have a primary key on this table? This is more challenging if you don't.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 23, 2011 at 12:34 pm
I dare say that in this case you are probably missing part of the reason for apparent dupes. Do you have an OrderDetail table? Or does the Order table contain all of the pertinent order item detail?
Each of your "duplicate" entries in this table may not really be duplicates because you have not identified that each record may refer to a different piece of that order. Also, does this table account for refunds or postponed fulfillment?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 23, 2011 at 12:36 pm
Yes, there is a primary key on ID column.
I want to keep min(ID)
September 23, 2011 at 12:44 pm
This will do what you are asking for:
create table #List
(
ListID int identity not null,
OrderBy char(1),
OrderDate datetime,
OrderCity varchar(20)
)
insert #List (OrderBy, OrderDate, OrderCity)
values
('A', '02/22/2011', 'NEW YORK'),
('A', '02/22/2011', 'LONDON'),
('A', '02/22/2011', 'NEW YORK'),
('B', '02/22/2011', 'NEW YORK'),
('A', '05/05/2011', 'NEW YORK'),
('A', '02/22/2011', 'NEW YORK')
delete #List where ListID not in
(
select Min(ListID) as ListIDToKeep
from #List
group by OrderBy, OrderDate, OrderCity
)
select * from #List
drop table #List
Note that I created consumable ddl and data so others can quickly and easily look at this.
I agree that this is probably not really want you to do but I will leave that decision up to you. It is pretty unusual to have true duplicates in an order table. As suggested there is probably an OrderDetail table or something that holds the line items for the order. If so, this will either fail if there is an actual foreign key on the detail table OR it will create orphaned detail records.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 23, 2011 at 12:45 pm
Can you do something simple like this or am I missing the boat on this one?
CREATE TABLE #temp (OrderBy varchar(1), OrderDate datetime, OrderCity varchar(15))
GO
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','LONDON')
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('B','02/22/2011','NEW YORK')
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','05/05/2011','NEW YORK')
INSERT INTO #temp (OrderBy , OrderDate , OrderCity) VALUES ('A','02/22/2011','NEW YORK')
DELETE T FROM
(SELECT ROW_NUMBER() OVER(PARTITION BY OrderBy, OrderDate, OrderCity
ORDER BY OrderBy, OrderDate, OrderCity) As RowNumber,* FROM #Temp )
T
WHERE RowNumber > 1
SELECT * FROM #temp
DROP TABLE #temp
Output:
A2011-02-22 00:00:00.000LONDON
A2011-02-22 00:00:00.000NEW YORK
B2011-02-22 00:00:00.000NEW YORK
A2011-05-05 00:00:00.000NEW YORK
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
September 23, 2011 at 12:47 pm
Sorry Sean, you had posted your reply while I was typing in mine, didn't mean to duplicate the effort! 😉
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
September 23, 2011 at 12:48 pm
MyDoggieJessie (9/23/2011)
Sorry Sean, you had posted your reply while I was typing in mine, didn't mean to duplicate the effort! 😉
No problem. Your code accomplished the exact same thing but does it a completely different way, once again proving that there are as many ways to do something as there people that want it done. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 23, 2011 at 12:49 pm
adonetok (9/23/2011)
Yes, there is a primary key on ID column.I want to keep min(ID)
I think the best thing you could do is provide DDL for the tables involved. Otherwise, solutions provided could be far from what you truly need.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 23, 2011 at 12:59 pm
Thank you for help.
I think that both ways will work.
September 25, 2011 at 10:54 pm
Firstly, here is a script which should list all tables and records which are duplicated, PK is removed from this comparison.
So the script will dynamically group by all the columns excluding the pk.
Be warned, aggregating a 2 million row table with 20 columns takes loooong.
DECLARE @sql varchar(max), @table varchar(100)
DECLARE table_cur CURSOR
FOR SELECT table_name
FROM INFORMATION_SCHEMA.TABLES t
LEFT OUTER JOIN sys.columns c ON c.object_id = OBJECT_ID(t.table_name)
AND c.is_identity = 1 -- GET ALL IDENTITY COLUMNS
LEFT OUTER JOIN sys.indexes i ON i.object_id = OBJECT_ID(t.table_name)
AND i.is_unique = 1 -- GET ALL UNIQUE INDEXES
LEFT OUTER JOIN sysobjects O ON o.id = OBJECT_ID(t.table_name)
AND o.xtype = 'PK'
WHERE c.object_id IS NULL -- ie an identity column doesn't exist
AND i.object_id IS NULL -- ie a unique index doesn't exist
AND o.id IS null
AND t.table_type = 'BASE TABLE'
AND t.TABLE_name NOT LIKE '%temp%' -- These would be uninteresting.
AND t.TABLE_NAME NOT LIKE '%tmp%'
ORDER BY table_name DESC
OPEN table_cur
FETCH NEXT FROM table_cur INTO @table
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = 'COUNT(*) '+@table
SELECT @sql = RTRIM(@sql) + ', [' + c.column_name + ']'
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE table_name = +RTRIM(@table)
SET @sql = 'SELECT ' + RTRIM(@sql) + ' FROM [' + RTRIM(@table) + '] '
+'GROUP BY ' + REPLACE(@sql, 'COUNT(*) '+@table+',', '') +' HAVING COUNT(*) > 1'
PRINT @sql
EXEC ( @sql
)
FETCH NEXT FROM table_cur INTO @table
END
CLOSE table_cur
DEALLOCATE table_cur
You can probably then do something like
select into HoldingTable min(pk),col1,col2,col3,col4 from MyTable group by col1,col2,col3,col4
alter MyTable name to myTableBackup.
drop mytable
select into mytable * from HoldingTable
--------------
You might need to drop the pk before dropping the table. Here is a script which will give you a list of all the fk's originating from that pk, which need to be dropped first before dropping the pk, then recreate the table and then re enable all the constraints.
I suggest you run this and save the entire script somewhere.
I ever so strongly suggest you do this on a QA environment first and prove it to a fellow dev before playing with production data.
------------
DECLARE @PKTableName VARCHAR(100),
@PKName varchar(100),
@FKName varchar(100),
@sql varchar(max),
@PKcolumnName varchar(30),
@table VARCHAR(30),
@FKColumnName VARCHAR(100)
SET @PKTableName = 'MyTable'
set @PKName = ( SELECT name
FROM sys.indexes
WHERE OBJECT_NAME(object_id) = @PKTableName
AND is_primary_key = 1
)
set @PKcolumnName = ( SELECT name
FROM sys.columns
WHERE OBJECT_NAME(object_id) = @PKTableName
AND is_identity = 1
)
IF EXISTS ( SELECT *
FROM sys.indexes
WHERE object_ID(@PKTableName) = object_id
AND index_id = 0 ) -- IF a heap
begin
IF EXISTS ( SELECT *
FROM sys.tables
WHERE object_id = OBJECT_ID(N'[dbo].[FKAgainstTableList]') )
BEGIN
DROP TABLE FKAgainstTableList
END
SELECT OBJECT_NAME(sys.foreign_key_columns.parent_object_id) [Table],
sys.columns.name [FKColumnName],
sys.foreign_keys.name [FKName]
INTO FKAgainstTableList
FROM sys.foreign_keys
INNER JOIN sys.foreign_key_columns ON sys.foreign_keys.object_id = sys.foreign_key_columns.constraint_object_id
INNER JOIN sys.columns ON sys.columns.object_id = sys.foreign_keys.parent_object_id
AND sys.columns.column_id = sys.foreign_key_columns.parent_column_id
WHERE OBJECT_NAME(sys.foreign_keys.referenced_object_id) = @PKTableName
DECLARE table_cur1 CURSOR
FOR SELECT *
FROM FKAgainstTableList
-------------------------------Disable constraint on FK Tables
OPEN table_cur1
FETCH NEXT FROM table_cur1 INTO @table, @FKColumnName, @FKName
WHILE@@FETCH_STATUS = 0
BEGIN
SET @sql = 'ALTER TABLE ' + @table + ' DROP CONSTRAINT '
+ @FKName
PRINT @sql
FETCH NEXT FROM table_cur1 INTO @table, @FKColumnName, @FKName
END
CLOSE table_cur1
DEALLOCATE table_cur1
--------------------------------DROP AND recreate CLUSTERED pk
IF EXISTS ( SELECT 1
FROM sys.indexes
WHERE object_id = OBJECT_ID(@PKTableName)
AND name = @PKName )
BEGIN
SET @sql = 'ALTER TABLE ' + @PKTableName + ' DROP CONSTRAINT '
+ @PKName
PRINT @sql
END
SET @sql = 'ALTER TABLE ' + @PKTableName + ' ADD CONSTRAINT '
+ @PKName + ' PRIMARY KEY CLUSTERED (' + @PKcolumnName
+ ' ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]'
PRINT ( @sql )
--------------------------------Enable FK constraints on FK tables.
DECLARE table_cur2 CURSOR
FOR SELECT *
FROM FKAgainstTableList
OPEN table_cur2
FETCH NEXT FROM table_cur2 INTO @table, @FKColumnName, @FKName
WHILE@@FETCH_STATUS = 0
BEGIN
SET @sql = 'ALTER TABLE ' + @table
+ ' WITH NOCHECK ADD CONSTRAINT ' + @FKName
+ ' FOREIGN KEY([' + @FKColumnName + '])
REFERENCES [' + @PKTableName + '] ([' + @PKcolumnName + '])'
PRINT ( @sql )
SET @sql = 'ALTER TABLE ' + @table + ' CHECK CONSTRAINT '
+ @FKName
PRINT ( @sql )
FETCH NEXT FROM table_cur2 INTO @table, @FKColumnName, @FKName
END
CLOSE table_cur2
DEALLOCATE table_cur2
DROP TABLE FKAgainstTableList
end
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply