April 27, 2007 at 10:03 am
Hello again.
I have a table with columns: ID, Service_Tag, Name, and Date, among others.
The service_tag has duplicate entries. The ID is the primary key, unique identity.
I need to delete the duplicate service_tag rows keeping the one with the highest ID.
Here is how I'm doing it now:
declare @counter int
set @counter = 0
while @counter < 500
begin
set @counter = @counter + 1
--print 'The counter is ' + cast(@counter as char)
delete from servicetagreport where ID IN (select max(Id) as Id from servicetagreport group by service_Tag having count(service_Tag)>1)
end
What I would like is a loop that will continue until 'count(service_tag)>1' is 0.
In other words, I don't know how many loops I'll need to make sure there are no more duplicate entries. Right now I am using 500. This
is too many for now, but I don't want to set it too low and have duplicates remaining in the table since I will be using this in a job.
Thanks for your help.
April 27, 2007 at 10:26 am
April 27, 2007 at 10:36 am
Here is the data you requested. The table has about 20000 rows.
CREATE TABLE [dbo].[ServiceTagReport] (
[ID] [int] NOT NULL ,
[Service_Tag] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Name] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Workstation_User] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Monitor_Type] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[IP_Address] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Date] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
ID Service_Tag Name Workstation_User Monitor_Type IP_Address Date
1 6C7DY51 IT-11 HMM102927 OptiPlex GX280 NULL Sun 04/01/2007 0:11:39.23
2 BY4691S ELD01 hmm100139 NULL NULL Sun 04/01/2007 0:19:36.30
3 51PDY51 MAINTENANCE-76 hmm102808 OptiPlex GX280 NULL Sun 04/01/2007 1:20:47.98
4 53F2121 MAINTENANCE-58 HMM102244 OptiPlex GX260 10.37.137.146 Sun 04/01/2007 1:22:59.04
5 GY28K11 MAINTENANCE-57 HMM102252 OptiPlex GX260 10.37.137.136 Sun 04/01/2007 1:49:58.65
6 73F2121 MAINTENANCE-36 HMM100684 OptiPlex GX260 10.37.182.53 Sun 04/01/2007 3:33:41.24
8 64FHY51 MAINTENANCE-68 HMM100517 OptiPlex GX280 NULL Sun 04/01/2007 4:46:18.20
9 70PDY51 ENGINE-40 HMM101068 OptiPlex GX280 NULL Sun 04/01/2007 5:02:27.39
11 9TXMV81 MAINTENANCE-34L HMM102732 Latitude D610 192.168.0.3 Sun 04/01/2007 5:10:25.95
13 7MWRK21 GA-29 SDuncan OptiPlex GX260 10.37.139.55 Sun 04/01/2007 5:43:26.02
14 81PDY51 MAINTENANCE-72 hmm100689 OptiPlex GX280 NULL Sun 04/01/2007 5:45:15.50
15 8F01X41 PAINT-14 HMM101119 OptiPlex GX270 NULL Sun 04/01/2007 5:45:58.59
16 JK13431 GA-30 PChildress OptiPlex GX270 NULL Sun 04/01/2007 5:46:30.47
17 9LN8X11 MAINTENANCE-59 HMM100307 OptiPlex GX260 10.37.137.144 Sun 04/01/2007 5:57:00.84
19 302TD41 MAINTENANCE-80 HMM100676 OptiPlex GX270 NULL Sun 04/01/2007 6:03:06.82
Thank you for your help
April 27, 2007 at 12:19 pm
Try this.
DELETE stg
FROM (SELECT Service_Tag, MAX(id) mid FROM ServiceTagReport GROUP BY Service_Tag) a, ServiceTagReport stg
WHERE a.Service_Tag=stg.Service_Tag
AND a.mid<>stg.id
Brian
April 27, 2007 at 12:42 pm
Charlie...
Dunno if you can do anything about it, but it's "Death by SQL" to store dates and time as NVARCHAR (ESPECIALLY WITH THE DAY OF THE BLOODY WEEK!!) instead of storing them as just a DateTime data type.
Anyway, here's a solution to your dupe problem...
DELETE str1
FROM dbo.ServiceTagReport str1,
dbo.ServiceTagReport str2
WHERE str1.Service_Tag = str2.Service_Tag
AND str1.ID < str2.ID
You won't be able to tell on only 20,000 rows, but the above method is remarkably fast... it will find and delete 18,000 dupes in 4 million rows in about 30 seconds. There is a method in SQL Server 2005 that uses ROWCOUNT and PARTION OVER that's a bit faster, but I don't think you find one faster than this for SQL Server 2000 or SQL Server 7.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2007 at 3:20 pm
Brian - Thanks, that works beautifully!! Thank you for the help.
Jeff - Yes, I know about the "Death by SQL" This table is collected on the network every time someone logs in.
The date field in this case is there for reference only it is not used for any calculations.
Your solution also works great. Sometime soon I'll be able to figure these out by myself. Until then, again, Thanks very much
April 27, 2007 at 3:24 pm
Heh... "for reference only"... just wait until they ask "how many logins this week"... and they will...
Anyway, thanks for the feedback, Charlie.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply