Deleting duplicate records without adding identity col

Question

Deleting duplicate records without adding identity col

sql server developer

SSCarpal Tunnel

Points: 4746
More actions
August 12, 2009 at 11:01 am

#103800

Hi,
I have a table from which I need to delete duplicate records. I dont have identity col and i cannot add identity col. For that i've used Row_number() as below

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Garadin One Orange Chip Points: 29613 More actions · Answer 1

I think you're going to need some sort of uniquely identifying field in your base table so that you can isolate rows for deletion. It doesn't need to stay there afterwards, but without a way to uniquely identify rows, how are you going to differentiate between which row to keep and which row to delete?

You could do this in a lot of different ways depending on your constraints. Maybe you could explain why you can't add an identity column? Can you add an any column?

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Steve Jones - SSC Editor SSC Guru Points: 736478 More actions · Answer 2

I'd agree with Seth. You don't need an identity, but you need a unique way to identify rows, otherwise you're likely have have similar issues in the future.

Jeffrey Williams SSC Guru Points: 90304 More actions · Answer 3

sql server developer (8/12/2009)
Hi,
I have a table from which I need to delete duplicate records. I dont have identity col and i cannot add identity col. For that i've used Row_number() as below
WITH [CTE_EMDegree] AS
(
SELECT ROW_NUMBER() OVER (ORDER BY t1.Employee,t1.Degree,t1.specialty,t1.institution) AS 'RowNumber',t1.*
FROM
EMDegree_temp1 t1 JOIN
(SELECT employee,Degree,specialty,institution,count(*) as con
FROM EMDegree_temp1
group by employee,Degree,specialty,institution having count(*)>1) t2
ON t1.employee=t2.employee and
ISNULL(t1.Degree,'')=IsNULL(t2.Degree,'') and
IsNULL(t1.specialty,'') = IsNull(t2.specialty,'') and
IsNULL(t1.institution,'') = IsNull(t2.institution,'')
)
--DELETE
FROM [CTE_EMDegree]
WHERE RowNumber NOT IN
(SELECT min(RowNumber) FROM CTE_EMDegree group by Employee,Degree,Specialty,Institution)
When i execute the above, i'm receiving the following error:
Msg 4405, Level 16, State 1, Line 1
View or function 'CTE_EMDegree' is not updatable because the modification affects multiple base tables.
Can someone where i need to change?

Why are you joining back to the same table with a group by? That is the problem with your query and why you are getting an error.

Please read the article I link to in my signature - and then post the create table statements, insert statements for sample data and expected results.

My guess is you can probably get rid of the derived table completed for this, but without the above I really can't help you more than this.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 4

Hi... I may well have misunderstood what you are attempting..it wouldn't be the first time ....;-)

Not knowing how many more colums you have in your table or how many rows you need to sort, might make this unsuitable for you

...here is my effort:

USE [tempdb]

GO

---create table

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[#EmDegree]') AND type in (N'U'))

DROP TABLE #EmDegree

CREATE TABLE [#EmDegree](

[Employee] [int] NULL,

[Degree] [char](50) NULL,

[Speciality] [char](50) NULL,

[Institution] [char](50) NULL

)

---insert some duplicated data

INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (1,'MA ','Medicine ','Oxford ')

INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (2,'Bsc ','Maths ','Cambridge ')

INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (3,'Phd ','Physics ','Oxford ')

--- display original records

SELECT Employee, Degree, Speciality, Institution

FROM #EmDegree

---create a table to receive distinct records

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[#EmDegree_Distinct]') AND type in (N'U'))

DROP TABLE #EmDegree_Distinct

CREATE TABLE [#EmDegree_Distinct](

[Employee] [int] NOT NULL,

[Degree] [varchar](10) NOT NULL,

[Speciality] [varchar](10) NOT NULL,

[Institution] [varchar](10) NOT NULL,

)

GO

--- select just unique records from #EmDegree and put in #EmDegree_Distinct

--- this may not be practical if you have more columns than you have described and lots of rows.

INSERT INTO #EmDegree_Distinct

(Employee, Degree, Speciality, Institution)

SELECT DISTINCT Employee, Degree, Speciality, Institution

FROM #EmDegree

---delete exg records

TRUNCATE TABLE #EmDegree

--- replace with new distinct records

INSERT INTO #EmDegree

(Employee, Degree, Speciality, Institution)

SELECT Employee, Degree, Speciality, Institution

FROM #EmDegree_Distinct

--display non dupe records

SELECT Employee, Degree, Speciality, Institution

FROM #EmDegree

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

It would also help if you could provide the DDL for the tables involved, sample data that shows the problem you are trying to solve, expected results to be achieved.

Please read and the follow the instructions in the first article referenced in my signaturature block and you will find that you will get much better help and even get tested code in return.