July 21, 2007 at 8:34 am
How do I delete duplicates from the AilmentDetail table with the following?
SELECT DISTINCT AilmentID,DataPointID, Count(AilmentID) AS CountOfAilmentID
FROM AilmentDetail
GROUP BY AilmentID ,DataPointID HAVING (Count(AilmentID))>1
ORDER BY AilmentID
Thanks!
July 21, 2007 at 9:06 am
Can we see the rest of the table's definition?
July 21, 2007 at 9:37 am
The reason we need to see the rest of the table's definition is because we need to identify a column with some chronology in it... either a date column or an autonumbering column... AND, we need to know which dupes you want to keep... the earliest, the latest, or ???
--Jeff Moden
Change is inevitable... Change for the better is not.
July 21, 2007 at 9:57 am
table def:
CREATE TABLE [dbo].[AilmentDetail] (
[AilmentDetailID] [int] IDENTITY (0, 1) NOT NULL ,
[AilmentID] [int] NOT NULL ,
[DataPointID] [int] NOT NULL ,
[DataValue] [varchar] (200) NOT NULL ,
[DateStart] [smalldatetime] NULL ,
[DateEnd] [smalldatetime] NULL ,
[SortOrder] [int] NOT NULL ,
[Note] [varchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DateCreated] [smalldatetime] NOT NULL ,
[EmpCreated] [varchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DateUpdated] [smalldatetime] NOT NULL ,
[EmpUpdated] [varchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[TmStamp] [timestamp] NULL
) ON [PRIMARY]
I want to retain the earliest value, which would be the lowest value of AilmentDetailID (or earliest DateCreated)
July 21, 2007 at 11:01 am
Perfect...
Obviously, you need to test this on a copy of the table before you try it on the real table...
DELETE ad1
FROM dbo.AilmentDetail_Test ad1,
dbo.AilmentDetail_Test ad2
WHERE ad1.AilmentID = ad2.AilmentID
AND ad1.DataPointID = ad2.DataPointID
AND ad1.AilmentDetailID > ad2.AilmentDetailID
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2007 at 1:23 am
DELETE
ad
FROM AilmentDetail AS ad
LEFT JOIN (
SELECT MIN(AilmentDetailID) AS KeepID
FROM AilmentDetail
GROUP BY AilmentID,
DataPointID
) AS k ON k.KeepID = ad.AilmentDetailID
WHERE k.KeepID IS NULL
N 56°04'39.16"
E 12°55'05.25"
July 22, 2007 at 4:48 am
I'm always curious for these things... which one is the fastest??
July 22, 2007 at 6:40 am
Thanks! I will test both
July 22, 2007 at 7:06 am
I think Jeff soon will post his million record test again
Checking the query plans I might think Jeff's suggestion is faster.
N 56°04'39.16"
E 12°55'05.25"
July 22, 2007 at 7:26 am
USE TEMPDB
CREATE TABLE DBO.AILMENTDETAIL (
AILMENTDETAILID INT IDENTITY( 1 , 1 ),
AILMENTID INT,
DATAPOINTID INT)
INSERT DBO.AILMENTDETAIL
(AILMENTID,
DATAPOINTID)
SELECT TOP 1000000 ABS(CHECKSUM(NEWID()))%10000,
ABS(CHECKSUM(NEWID()))%10000
FROM MASTER..SYSCOLUMNS AS SC1
CROSS JOIN MASTER..SYSCOLUMNS AS SC2
CREATE CLUSTERED INDEX IX_AILMENTDETAIL ON DBO.AILMENTDETAIL (
AILMENTDETAILID)
DECLARE @StartTime DATETIME
BEGIN TRAN
PRINT '===== Jeff ======'
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET @StartTime = GETDATE()
DELETE AD1
FROM DBO.AILMENTDETAIL AS AD1
CROSS JOIN DBO.AILMENTDETAIL AS AD2
WHERE AD1.AILMENTID = AD2.AILMENTID
AND AD1.DATAPOINTID = AD2.DATAPOINTID
AND AD1.AILMENTDETAILID > AD2.AILMENTDETAILID
PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),
10) + ' Milliseconds'
PRINT ' '
ROLLBACK TRAN
BEGIN TRAN
PRINT '===== Peso ======'
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET @StartTime = GETDATE()
DELETE AD
FROM AILMENTDETAIL AS AD
LEFT JOIN (SELECT MIN(AILMENTDETAILID) AS KEEPID
FROM AILMENTDETAIL
GROUP BY AILMENTID,DATAPOINTID) AS K
ON K.KEEPID = AD.AILMENTDETAILID
WHERE K.KEEPID IS NULL
PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),
10) + ' Milliseconds'
PRINT ' '
ROLLBACK TRAN
BEGIN TRAN
PRINT '===== Peso 2 ===='
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET @StartTime = GETDATE()
DELETE X
FROM (SELECT ROW_NUMBER() OVER(PARTITION BY AILMENTID,DATAPOINTID ORDER BY AILMENTDETAILID) AS RECID
FROM DBO.AILMENTDETAIL) AS X
WHERE RECID > 1
PRINT STR(DATEDIFF(MS,@StartTime,CURRENT_TIMESTAMP),
10) + ' Milliseconds'
PRINT ' '
ROLLBACK TRAN
DROP TABLE DBO.AILMENTDETAIL
N 56°04'39.16"
E 12°55'05.25"
July 22, 2007 at 7:34 am
===== Jeff ======
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(5031 row(s) affected)
11936 Milliseconds
===== Peso ======
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(5031 row(s) affected)
12733 Milliseconds
===== Peso 2 ====
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(5031 row(s) affected)
4623 Milliseconds
N 56°04'39.16"
E 12°55'05.25"
July 22, 2007 at 9:58 am
Dang... I gotta get 2k5...
Heh, I noticed the Peter converted the code to the good'n'proper ANSI style joins so that it's apparent that these are CROSS JOINS... for 1 reason and 1 reason only do I not do that... it scares the hell out of my DBA's when they review the code. For some reason, they're more accepting of "hidden" cross joins even though they pick up on the fact that they are actually cross joins.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2007 at 1:40 pm
Strange...
I always thought that
FROM dbo.AilmentDetail_Test ad1,
dbo.AilmentDetail_Test ad2
WHERE ad1.AilmentID = ad2.AilmentID
AND ad1.DataPointID = ad2.DataPointID
AND ad1.AilmentDetailID > ad2.AilmentDetailID
means INNER JOIN, not CROSS JOIN.
_____________
Code for TallyGenerator
July 22, 2007 at 1:55 pm
And results I've got are quite different:
===== Jeff ======
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
6280 Milliseconds
===== Peso ======
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
48296 Milliseconds
Sorry, cannit do SQL2005 test.
_____________
Code for TallyGenerator
July 22, 2007 at 2:05 pm
That's what I get for answering email before the necessary amount of coffee...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply