April 7, 2010 at 2:32 pm
Hi all,
I am having hard time forming this query to dedulicate - duplicate rows in a table:
select 1_KEY,2_KEY
from MY_TABLE
group by 1_KEY,2_KEY
having count(*) >1
now this query gives me a bunch of repeated rows(repeated 3 times) .
I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.
Please let me know if this can be done...
thanks in advance
Thanks [/font]
April 7, 2010 at 2:50 pm
You can either delete them all and reinsert a single copy or create a join and use delete top (n). The top (n) strategy will only work when n is a constant, unless you implement dynamic SQL, I believe. You need to take into consideration any constraints that may be in place.
Why is your query using generic names for tables and fields? This isn't a homework assignment, is it?
April 7, 2010 at 5:05 pm
You might want to look into ROW_NUMBER together with CTEs. There are several solutions available on this site. Please search for "duplicate rows" in the scripts section.
April 7, 2010 at 11:00 pm
Here is one version of the code to delete duplicate values
-- Create the table
if object_id('Duplicates') is not null
drop table Duplicates
Create table Duplicates
(
c1 int
)
-- insert some sample records
declare @i int set @i = 1
while (@i <= 5 )
begin
insert into Duplicates
select 1
union all
select 2
union all
select 3
set @i = @i + 1
end
select * From Duplicates order by c1
-- delete script
;WITH CTE (Rank_of_Duplicates , c1)
as
(
select row_number() over(partition by c1 order by c1 desc) as Rank_of_Duplicates, c1 from Duplicates
)
DELETE FROM CTE
WHERE
Rank_of_Duplicates not in
(
Select max(Rank_of_Duplicates) from cte
group by c1
)
select * From Duplicates order by c1
April 8, 2010 at 6:48 am
Kajal123 (4/7/2010)
Hi all,I am having hard time forming this query to dedulicate - duplicate rows in a table:
select 1_KEY,2_KEY
from MY_TABLE
group by 1_KEY,2_KEY
having count(*) >1
now this query gives me a bunch of repeated rows(repeated 3 times) .
I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.
Please let me know if this can be done...
thanks in advance
Lutz has it right... ROW_NUMBER (and a couple of other methods if there's something temporal in the table) is one of the easiest ways to do this.
If you have a large table with more dupes than not, the suggestion of copying distinct info to another table and doing a rename may be more effective.
If the table with the duplicates is NOT simply a staging table, I'd suggest that a bit of a redesign is in order and should definitely include a Primary Key to prevent duplictes in the future.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2010 at 6:51 am
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2010 at 7:45 am
Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.
No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎
April 8, 2010 at 3:44 pm
COldCoffee (4/8/2010)
Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎
Anyone who tries deserves a pat on the back. Glad to have you on board.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2010 at 7:52 pm
As a bit of a side bar, if you key off the ROW_NUMBER for the DELETE, you can eliminate one full execution of the CTE. Like this...
--=============================================================================
-- Create and populate a test table. This is NOT a part of the solution.
--=============================================================================
--===== Conditionally drop the test table to make reruns easier
IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL
DROP TABLE #Duplicates
;
--===== Create and populate the test table on the fly
SELECT ca.SomeValue
INTO #Duplicates
FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table
CROSS JOIN
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)
WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)
AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert
;
--=============================================================================
-- Here's a DELETE solution that only causes the CTE to execute once.
--=============================================================================
WITH
cteDuplicates AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY (SELECT 1)) AS NumberedDupes,
SomeValue
FROM #Duplicates
)
DELETE cteDuplicates
WHERE NumberedDupes > 1
;
If temporal data is available, you can easily eliminate all but the latest duplicated rows by using the ORDER BY of the ROW_NUMBER to control what get's the number "1" so it won't get deleted. This also saves on one full execution of the CTE. Here's an example where an IDENTITY column represents "temporal data"...
--=============================================================================
-- Create and populate a test table. This is NOT a part of the solution.
--=============================================================================
--===== Conditionally drop the test table to make reruns easier
IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL
DROP TABLE #Duplicates
;
--===== Create and populate the test table on the fly
SELECT IDENTITY(INT,1,1) AS SomeID,
ca.SomeValue
INTO #Duplicates
FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table
CROSS JOIN
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)
WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)
AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert
;
--===== Display the content of the table before the deletes.
-- This is NOT a part of the solution.
SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID
;
--=============================================================================
-- Here's a DELETE solution that only causes the CTE to execute once.
-- This deletes all dupes except for the "latest" entry for each set of
-- duplicates (according to the IDENTITY column, anyway).
--=============================================================================
WITH
cteDuplicates AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY SomeID DESC) AS NumberedDupes,
SomeValue
FROM #Duplicates
)
DELETE cteDuplicates
WHERE NumberedDupes > 1
;
--===== Display the content of the table after the deletes.
-- This is NOT a part of the solution.
SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID
;
Heh... since I'm mostly allergic to loops, I also built the test data using my friend the "Cross Join" just to show an alternate method. "Cross Apply" would have also done the job nicely.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2010 at 8:50 pm
Jeff Moden (4/8/2010)
COldCoffee (4/8/2010)
Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎
Anyone who tries deserves a pat on the back. Glad to have you on board.
Thanks Sir :-)... i am at home so have not yet checked out the code you have given, once i reach office i will go thro that and feel it's superiority.. Thanks once again, jeff!:-)
April 9, 2010 at 5:31 am
Hi,
use
"set rowcount 2"
it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like
set rowcount 2
delete from MY_TABLE where 1_KEY=1 and 2_KEY=2
April 10, 2010 at 10:24 pm
Trouble Shooter (4/9/2010)
Hi,use
"set rowcount 2"
it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like
set rowcount 2
delete from MY_TABLE where 1_KEY=1 and 2_KEY=2
Um... probably not a good idea. If there are only 2 rows, they'll both be deleted. If there are 4 rows, two will remain.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 12, 2010 at 12:06 am
Hi,
2 is the variable ,u can set as your requirement.
u can set it to 3 or 1 also as your requirement.
April 12, 2010 at 5:12 am
Trouble Shooter (4/12/2010)
Hi,2 is the variable ,u can set as your requirement.
u can set it to 3 or 1 also as your requirement.
This is the same error you made in another thread tonight - where several people have shown you a better way. You can't remove a variable number of rows of duplicates, with set-based code, from different groups with SET ROWCOUNT.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 12, 2010 at 7:14 am
Trouble Shooter (4/12/2010)
Hi,2 is the variable ,u can set as your requirement.
u can set it to 3 or 1 also as your requirement.
I know... but for only one set of duplicates at a time which makes a RBAR solution because it won't work as a full table set based solution.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply