Duplicate identification

  • Is there any thing like rowid in Oracle?

    How to delete duplicate records from SQL sever without using template table or create an extra column.

  • indraprakash (6/18/2008)


    Is there any thing like rowid in Oracle?

    How to delete duplicate records from SQL sever without using template table or create an extra column.

    There is no rowid that you could access. You could identify the duplicate rows using count and group by, and delete them one by one using

    delete top (1) from tablename where somecolumn = duplicatevalue

    This could be automated as well. However, if there are many duplicates, you could just select the unique entries by using the distinct keyword into a new table, and replace the old table with the new one.

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • thanks,

    But this is not the solution of my query. I want to know how to delete dulicate records without using any new table or any add any extra column

  • indraprakash (6/18/2008)


    thanks,

    But this is not the solution of my query. I want to know how to delete dulicate records without using any new table or any add any extra column

    In a single statement you cannot do this. You need to differentiate the rows that you are deleting, because you want to delete only the duplicated ones. Becuase of this you need to use either "delete TOP" or "set rownumber", but this would mean that you need to execute this statement for each individual duplicated value. You can iterate through all the values, e.g. use cursors, but moving the unique data out then back to the table is probably easier.

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • indraprakash (6/18/2008)


    Is there any thing like rowid in Oracle?

    How to delete duplicate records from SQL sever without using template table or create an extra column.

    Can you post DDL for your table ?


    * Noel

  • Also knowing Oracle, I do wish that there was a ROWID equivalent in SQL Server. I'm also aware that it is used in "finding duplicate" queries.

    In SQL Server 2005 you can use a CTE per the attached code snipit:

    WITH numbered AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY {column name} ORDER BY {column name}) AS rowno,

    {column name},

    {column name2}

    FROM

    {table name}

    )

    DELETE FROM numbered

    WHERE rowno > 1;

    Working example (using a temp table as the base table) with a 2-column duplicate condition:

    CREATE TABLE #SampleTestData (ID int NULL, Keyvalue VARCHAR(2));

    INSERT INTO #SampleTestData (id, keyvalue) VALUES (1, 'aa');

    INSERT INTO #SampleTestData (id, keyvalue) VALUES (2, 'bb');

    INSERT INTO #SampleTestData (id, keyvalue) VALUES (1, 'aa');

    INSERT INTO #SampleTestData (id, keyvalue) VALUES (1, 'aa');

    SELECT * FROM #SampleTestData;

    WITH numbered AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS rowno,

    ID,

    Keyvalue

    FROM

    #SampleTestData

    )

    DELETE FROM numbered

    WHERE rowno > 1;

    SELECT * FROM #SampleTestData;

    Note: The "winks" in the above examples are closing (right) parens.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • JohnG (6/19/2008)


    Also knowing Oracle, I do wish that there was a ROWID equivalent in SQL Server. I'm also aware that it is used in "finding duplicate" queries.

    In SQL Server 2005 you can use a CTE per the attached code snipit:

    .....

    There is a simpler way 🙂 There is no need to use row_number if you want to get rid of the duplicates, you could just use

    SELECT DISTINCT keyvalue FROM #SampleTestData

    If you want to create an id for the unique value, you could do that easier by:

    SELECT DISTINCT identity(int, 1,1) as id, keyvalue into #foo FROM #SampleTestData

    I reckon neither of these will help the OP, as both of these create a separate table 🙁

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • Andras Belokosztolszki (6/18/2008)


    indraprakash (6/18/2008)


    thanks,

    But this is not the solution of my query. I want to know how to delete dulicate records without using any new table or any add any extra column

    In a single statement you cannot do this. You need to differentiate the rows that you are deleting, because you want to delete only the duplicated ones. Becuase of this you need to use either "delete TOP" or "set rownumber", but this would mean that you need to execute this statement for each individual duplicated value. You can iterate through all the values, e.g. use cursors, but moving the unique data out then back to the table is probably easier.

    Regards,

    Andras

    You are incorrect Andras. Another poster has already put up code that works in SQL 2005. There is a mechanism for 2000 as well although it is certainly not as elegant as using OVER and ROW_NUMBER.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • ...

    You are incorrect Andras. Another poster has already put up code that works in SQL 2005. There is a mechanism for 2000 as well although it is certainly not as elegant as using OVER and ROW_NUMBER.

    You are right, what confused my was that there was an id column in that example. Thanks for pointing this out. The CTE solution indeed works, and without the id column, where rows are indeed duplicates, the script would be like:

    CREATE TABLE #SampleTestData (keyvalue VARCHAR(2));

    INSERT INTO #SampleTestData (keyvalue) VALUES ('aa');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('bb');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('aa');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('aa');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('bb');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('aa');

    INSERT INTO #SampleTestData (keyvalue) VALUES ('aa');

    SELECT * FROM #SampleTestData;

    WITH numbered AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY keyvalue ORDER BY keyvalue) AS rowno,

    keyvalue

    FROM

    #SampleTestData

    )

    DELETE FROM numbered

    WHERE rowno > 1;

    SELECT * FROM #SampleTestData;

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • thanks Andras

    indraprakash

  • indraprakash (6/20/2008)


    thanks Andras

    indraprakash

    The solution actually came from JohnG, and I've learned something too 🙂

    Glad your issue is solved

    - Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • The solution actually came from JohnG, and I've learned something too

    Glad your issue is solved

    - Andras

    I can't take credit for the actual solution. I'm just re-providing something that was provided earlier (from some thread) on the SQL ServerCentral site. So the real credit goes to this site.

    Just like someone provided the "duplicate" discovery/deletion solution for Oracle using ROWID, which I have passed on numerous times, I'm also passing on something that was provided to me. Think of it as borrowing tools from a common "toolbox".


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply