delete repeating rows

Question

delete repeating rows

sql_learner29

SSCrazy Eights

Points: 9107
More actions
April 7, 2010 at 2:32 pm

#231478

Hi all,
I am having hard time forming this query to dedulicate - duplicate rows in a table:
select 1_KEY,2_KEY
from MY_TABLE
group by 1_KEY,2_KEY
having count(*) >1
now this query gives me a bunch of repeated rows(repeated 3 times) .
I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.
Please let me know if this can be done...
thanks in advance
[font="Comic Sans MS"]
---------------------------------------------------
Thanks [/font]

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Tim Weintz SSCommitted Points: 1622 More actions · Answer 1

You can either delete them all and reinsert a single copy or create a join and use delete top (n). The top (n) strategy will only work when n is a constant, unless you implement dynamic SQL, I believe. You need to take into consideration any constraints that may be in place.

Why is your query using generic names for tables and fields? This isn't a homework assignment, is it?

LutzM SSC Guru Points: 107049 More actions · Answer 2

You might want to look into ROW_NUMBER together with CTEs. There are several solutions available on this site. Please search for "duplicate rows" in the scripts section.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 3

Here is one version of the code to delete duplicate values

-- Create the table

if object_id('Duplicates') is not null

drop table Duplicates

Create table Duplicates

(

c1 int

)

-- insert some sample records

declare @i int set @i = 1

while (@i <= 5 )

begin

insert into Duplicates

select 1

union all

select 2

union all

select 3

set @i = @i + 1

end

select * From Duplicates order by c1

-- delete script

;WITH CTE (Rank_of_Duplicates , c1)

as

(

select row_number() over(partition by c1 order by c1 desc) as Rank_of_Duplicates, c1 from Duplicates

)

DELETE FROM CTE

WHERE

Rank_of_Duplicates not in

(

Select max(Rank_of_Duplicates) from cte

group by c1

)

select * From Duplicates order by c1

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 4

Kajal123 (4/7/2010)
Hi all,
I am having hard time forming this query to dedulicate - duplicate rows in a table:
select 1_KEY,2_KEY
from MY_TABLE
group by 1_KEY,2_KEY
having count(*) >1
now this query gives me a bunch of repeated rows(repeated 3 times) .
I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.
Please let me know if this can be done...
thanks in advance

Lutz has it right... ROW_NUMBER (and a couple of other methods if there's something temporal in the table) is one of the easiest ways to do this.

If you have a large table with more dupes than not, the suggestion of copying distinct info to another table and doing a rename may be more effective.

If the table with the duplicates is NOT simply a staging table, I'd suggest that a bit of a redesign is in order and should definitely include a Primary Key to prevent duplictes in the future.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 6

Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

COldCoffee (4/8/2010)
Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.
No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎

Anyone who tries deserves a pat on the back. Glad to have you on board.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

As a bit of a side bar, if you key off the ROW_NUMBER for the DELETE, you can eliminate one full execution of the CTE. Like this...

--=============================================================================

-- Create and populate a test table. This is NOT a part of the solution.

--=============================================================================

--===== Conditionally drop the test table to make reruns easier

IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL

DROP TABLE #Duplicates

;

--===== Create and populate the test table on the fly

SELECT ca.SomeValue

INTO #Duplicates

FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table

CROSS JOIN

(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)

WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)

AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert

;

--=============================================================================

-- Here's a DELETE solution that only causes the CTE to execute once.

--=============================================================================

WITH

cteDuplicates AS

(

SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY (SELECT 1)) AS NumberedDupes,

SomeValue

FROM #Duplicates

)

DELETE cteDuplicates

WHERE NumberedDupes > 1

;

If temporal data is available, you can easily eliminate all but the latest duplicated rows by using the ORDER BY of the ROW_NUMBER to control what get's the number "1" so it won't get deleted. This also saves on one full execution of the CTE. Here's an example where an IDENTITY column represents "temporal data"...

--=============================================================================

-- Create and populate a test table. This is NOT a part of the solution.

--=============================================================================

--===== Conditionally drop the test table to make reruns easier

IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL

DROP TABLE #Duplicates

;

--===== Create and populate the test table on the fly

SELECT IDENTITY(INT,1,1) AS SomeID,

ca.SomeValue

INTO #Duplicates

FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table

CROSS JOIN

(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)

WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)

AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert

;

--===== Display the content of the table before the deletes.

-- This is NOT a part of the solution.

SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID

;

--=============================================================================

-- Here's a DELETE solution that only causes the CTE to execute once.

-- This deletes all dupes except for the "latest" entry for each set of

-- duplicates (according to the IDENTITY column, anyway).

--=============================================================================

WITH

cteDuplicates AS

(

SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY SomeID DESC) AS NumberedDupes,

SomeValue

FROM #Duplicates

)

DELETE cteDuplicates

WHERE NumberedDupes > 1

;

--===== Display the content of the table after the deletes.

-- This is NOT a part of the solution.

SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID

;

Heh... since I'm mostly allergic to loops, I also built the test data using my friend the "Cross Join" just to show an alternate method. "Cross Apply" would have also done the job nicely.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 9

Jeff Moden (4/8/2010)
COldCoffee (4/8/2010)
Jeff Moden (4/8/2010)
Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.
No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎
Anyone who tries deserves a pat on the back. Glad to have you on board.

Thanks Sir :-)... i am at home so have not yet checked out the code you have given, once i reach office i will go thro that and feel it's superiority.. Thanks once again, jeff!:-)

Trouble Shooter SSC Eights! Points: 902 More actions · Answer 10

Hi,

use

"set rowcount 2"

it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like

set rowcount 2

delete from MY_TABLE where 1_KEY=1 and 2_KEY=2

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 11

Trouble Shooter (4/9/2010)
Hi,
use
"set rowcount 2"
it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like
set rowcount 2
delete from MY_TABLE where 1_KEY=1 and 2_KEY=2

Um... probably not a good idea. If there are only 2 rows, they'll both be deleted. If there are 4 rows, two will remain.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Trouble Shooter SSC Eights! Points: 902 More actions · Answer 12

Hi,

2 is the variable ,u can set as your requirement.

u can set it to 3 or 1 also as your requirement.

Paul White SSC Guru Points: 150467 More actions · Answer 13

Trouble Shooter (4/12/2010)
Hi,
2 is the variable ,u can set as your requirement.
u can set it to 3 or 1 also as your requirement.

This is the same error you made in another thread tonight - where several people have shown you a better way. You can't remove a variable number of rows of duplicates, with set-based code, from different groups with SET ROWCOUNT.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 14

Trouble Shooter (4/12/2010)
Hi,
2 is the variable ,u can set as your requirement.
u can set it to 3 or 1 also as your requirement.

I know... but for only one set of duplicates at a time which makes a RBAR solution because it won't work as a full table set based solution.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)