Deleting duplicate records without adding identity col

  • Hi,

    I have a table from which I need to delete duplicate records. I dont have identity col and i cannot add identity col. For that i've used Row_number() as below

  • I think you're going to need some sort of uniquely identifying field in your base table so that you can isolate rows for deletion. It doesn't need to stay there afterwards, but without a way to uniquely identify rows, how are you going to differentiate between which row to keep and which row to delete?

    You could do this in a lot of different ways depending on your constraints. Maybe you could explain why you can't add an identity column? Can you add an any column?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I'd agree with Seth. You don't need an identity, but you need a unique way to identify rows, otherwise you're likely have have similar issues in the future.

  • sql server developer (8/12/2009)


    Hi,

    I have a table from which I need to delete duplicate records. I dont have identity col and i cannot add identity col. For that i've used Row_number() as below

    WITH [CTE_EMDegree] AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY t1.Employee,t1.Degree,t1.specialty,t1.institution) AS 'RowNumber',t1.*

    FROM

    EMDegree_temp1 t1 JOIN

    (SELECT employee,Degree,specialty,institution,count(*) as con

    FROM EMDegree_temp1

    group by employee,Degree,specialty,institution having count(*)>1) t2

    ON t1.employee=t2.employee and

    ISNULL(t1.Degree,'')=IsNULL(t2.Degree,'') and

    IsNULL(t1.specialty,'') = IsNull(t2.specialty,'') and

    IsNULL(t1.institution,'') = IsNull(t2.institution,'')

    )

    --DELETE

    FROM [CTE_EMDegree]

    WHERE RowNumber NOT IN

    (SELECT min(RowNumber) FROM CTE_EMDegree group by Employee,Degree,Specialty,Institution)

    When i execute the above, i'm receiving the following error:

    Msg 4405, Level 16, State 1, Line 1

    View or function 'CTE_EMDegree' is not updatable because the modification affects multiple base tables.

    Can someone where i need to change?

    Why are you joining back to the same table with a group by? That is the problem with your query and why you are getting an error.

    Please read the article I link to in my signature - and then post the create table statements, insert statements for sample data and expected results.

    My guess is you can probably get rid of the derived table completed for this, but without the above I really can't help you more than this.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Hi... I may well have misunderstood what you are attempting..it wouldn't be the first time ....;-)

    Not knowing how many more colums you have in your table or how many rows you need to sort, might make this unsuitable for you

    ...here is my effort:

    USE [tempdb]

    GO

    ---create table

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[#EmDegree]') AND type in (N'U'))

    DROP TABLE #EmDegree

    CREATE TABLE [#EmDegree](

    [Employee] [int] NULL,

    [Degree] [char](50) NULL,

    [Speciality] [char](50) NULL,

    [Institution] [char](50) NULL

    )

    ---insert some duplicated data

    INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (1,'MA ','Medicine ','Oxford ')

    INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (1,'MA ','Medicine ','Oxford ')

    INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (2,'Bsc ','Maths ','Cambridge ')

    INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (2,'Bsc ','Maths ','Cambridge ')

    INSERT [dbo].[#EmDegree]([Employee],[Degree],[Speciality],[Institution]) VALUES (3,'Phd ','Physics ','Oxford ')

    --- display original records

    SELECT Employee, Degree, Speciality, Institution

    FROM #EmDegree

    ---create a table to receive distinct records

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[#EmDegree_Distinct]') AND type in (N'U'))

    DROP TABLE #EmDegree_Distinct

    CREATE TABLE [#EmDegree_Distinct](

    [Employee] [int] NOT NULL,

    [Degree] [varchar](10) NOT NULL,

    [Speciality] [varchar](10) NOT NULL,

    [Institution] [varchar](10) NOT NULL,

    )

    GO

    --- select just unique records from #EmDegree and put in #EmDegree_Distinct

    --- this may not be practical if you have more columns than you have described and lots of rows.

    INSERT INTO #EmDegree_Distinct

    (Employee, Degree, Speciality, Institution)

    SELECT DISTINCT Employee, Degree, Speciality, Institution

    FROM #EmDegree

    ---delete exg records

    TRUNCATE TABLE #EmDegree

    --- replace with new distinct records

    INSERT INTO #EmDegree

    (Employee, Degree, Speciality, Institution)

    SELECT Employee, Degree, Speciality, Institution

    FROM #EmDegree_Distinct

    --display non dupe records

    SELECT Employee, Degree, Speciality, Institution

    FROM #EmDegree

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • It would also help if you could provide the DDL for the tables involved, sample data that shows the problem you are trying to solve, expected results to be achieved.

    Please read and the follow the instructions in the first article referenced in my signaturature block and you will find that you will get much better help and even get tested code in return.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply