delete duplicates

  • How do I delete duplicates from the AilmentDetail table with the following?

    SELECT DISTINCT AilmentID,DataPointID, Count(AilmentID) AS CountOfAilmentID

      FROM AilmentDetail

      GROUP BY AilmentID ,DataPointID  HAVING (Count(AilmentID))>1

    ORDER BY AilmentID

    Thanks!

  • Can we see the rest of the table's definition?

  • smknox

    The reason we need to see the rest of the table's definition is because we need to identify a column with some chronology in it... either a date column or an autonumbering column... AND, we need to know which dupes you want to keep... the earliest, the latest, or ???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • table def:

    CREATE TABLE [dbo].[AilmentDetail] (

     [AilmentDetailID] [int] IDENTITY (0, 1) NOT NULL ,

     [AilmentID] [int] NOT NULL ,

     [DataPointID] [int] NOT NULL ,

     [DataValue] [varchar] (200) NOT NULL ,

     [DateStart] [smalldatetime] NULL ,

     [DateEnd] [smalldatetime] NULL ,

     [SortOrder] [int] NOT NULL ,

     [Note] [varchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [DateCreated] [smalldatetime] NOT NULL ,

     [EmpCreated] [varchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [DateUpdated] [smalldatetime] NOT NULL ,

     [EmpUpdated] [varchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [TmStamp] [timestamp] NULL

    ) ON [PRIMARY]

    I want to retain the earliest value, which would be the lowest value of AilmentDetailID (or earliest DateCreated)

  • Perfect...

    Obviously, you need to test this on a copy of the table before you try it on the real table...

     DELETE ad1

       FROM dbo.AilmentDetail_Test ad1,

            dbo.AilmentDetail_Test ad2

      WHERE ad1.AilmentID       = ad2.AilmentID

        AND ad1.DataPointID     = ad2.DataPointID

        AND ad1.AilmentDetailID > ad2.AilmentDetailID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DELETE

         ad

    FROM       AilmentDetail AS ad

    LEFT JOIN  (

                   SELECT   MIN(AilmentDetailID) AS KeepID

                   FROM     AilmentDetail

                   GROUP BY AilmentID,

                            DataPointID

               ) AS k ON k.KeepID = ad.AilmentDetailID

    WHERE      k.KeepID IS NULL

     


    N 56°04'39.16"
    E 12°55'05.25"

  • I'm always curious for these things... which one is the fastest??

  • Thanks! I will test both

  • I think Jeff soon will post his million record test again

    Checking the query plans I might think Jeff's suggestion is faster.

     

     


    N 56°04'39.16"
    E 12°55'05.25"

  • USE TEMPDB

    CREATE TABLE DBO.AILMENTDETAIL (

      AILMENTDETAILID INT  IDENTITY( 1  , 1  ),

      AILMENTID       INT,

      DATAPOINTID     INT)

    INSERT DBO.AILMENTDETAIL

          (AILMENTID,

           DATAPOINTID)

    SELECT TOP 1000000 ABS(CHECKSUM(NEWID()))%10000,

                       ABS(CHECKSUM(NEWID()))%10000

    FROM   MASTER..SYSCOLUMNS AS SC1

           CROSS JOIN MASTER..SYSCOLUMNS AS SC2

    CREATE CLUSTERED INDEX IX_AILMENTDETAIL ON DBO.AILMENTDETAIL (

          AILMENTDETAILID)

    DECLARE  @StartTime DATETIME

    BEGIN TRAN

    PRINT '===== Jeff ======'

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET @StartTime = GETDATE()

    DELETE AD1

    FROM   DBO.AILMENTDETAIL AS AD1

           CROSS JOIN DBO.AILMENTDETAIL AS AD2

    WHERE  AD1.AILMENTID = AD2.AILMENTID

           AND AD1.DATAPOINTID = AD2.DATAPOINTID

           AND AD1.AILMENTDETAILID > AD2.AILMENTDETAILID

    PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),

        10) + ' Milliseconds'

    PRINT ' ' 

    ROLLBACK TRAN

    BEGIN TRAN

    PRINT '===== Peso ======'

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET @StartTime = GETDATE()

    DELETE AD

    FROM   AILMENTDETAIL AS AD

           LEFT JOIN (SELECT   MIN(AILMENTDETAILID) AS KEEPID

                      FROM     AILMENTDETAIL

                      GROUP BY AILMENTID,DATAPOINTID) AS K

             ON K.KEEPID = AD.AILMENTDETAILID

    WHERE  K.KEEPID IS NULL

    PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),

        10) + ' Milliseconds'

    PRINT ' '

    ROLLBACK TRAN

    BEGIN TRAN

    PRINT '===== Peso 2 ===='

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET @StartTime = GETDATE()

    DELETE X

    FROM   (SELECT ROW_NUMBER() OVER(PARTITION BY AILMENTID,DATAPOINTID ORDER BY AILMENTDETAILID) AS RECID

            FROM   DBO.AILMENTDETAIL) AS X

    WHERE  RECID > 1

    PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),

        10) + ' Milliseconds'

    PRINT ' '

    ROLLBACK TRAN

    DROP TABLE DBO.AILMENTDETAIL

     


    N 56°04'39.16"
    E 12°55'05.25"

  • ===== Jeff ======

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    (5031 row(s) affected)

    11936 Milliseconds

    ===== Peso ======

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    (5031 row(s) affected)

    12733 Milliseconds

    ===== Peso 2 ====

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    (5031 row(s) affected)

    4623 Milliseconds

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Dang... I gotta get 2k5...

    Heh, I noticed the Peter converted the code to the good'n'proper ANSI style joins so that it's apparent that these are CROSS JOINS... for 1 reason and 1 reason only do I not do that... it scares the hell out of my DBA's when they review the code.  For some reason, they're more accepting of "hidden" cross joins even though they pick up on the fact that they are actually cross joins.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Strange...

    I always thought that

    FROM dbo.AilmentDetail_Test ad1,

    dbo.AilmentDetail_Test ad2

    WHERE ad1.AilmentID = ad2.AilmentID

    AND ad1.DataPointID = ad2.DataPointID

    AND ad1.AilmentDetailID > ad2.AilmentDetailID

    means INNER JOIN, not CROSS JOIN.

    _____________
    Code for TallyGenerator

  • And results I've got are quite different:

    ===== Jeff ======

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    6280 Milliseconds

    ===== Peso ======

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    48296 Milliseconds

    Sorry, cannit do SQL2005 test.

    _____________
    Code for TallyGenerator

  • That's what I get for answering email before the necessary amount of coffee...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply