April 29, 2008 at 7:34 am
I would like to remove duplicates from a table where no primary key nor a date is present. you can use the following code to create some sample data
create table SC (CityName varchar(10), Zip Varchar(10), Description varchar(20))
insert into SC values ('Bank','22222','N/A')
insert into SC values ('Bank','22222','Here in town')
insert into SC values ('k','22222','N/A')
insert into SC values ('k','22222','N/A')
insert into SC values ('k','22222','Here in town')
insert into SC values ('Karachi','22222','Here in town')
insert into SC values ('Lahore','22222','Here in town')
thanks
April 29, 2008 at 8:01 am
So, which columns constitute a dupe? Just the first two or all of them?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 29, 2008 at 10:34 am
The SOP way is something like:
Select distinct * into #tmp From SC
Truncate Table SC
Insert into SC Select * from #tmp
Drop table #tmp
You also might want to wrap this in a transaction.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 29, 2008 at 12:20 pm
The only actual duplicates I see are:
insert into SC values ('k','22222','N/A')
insert into SC values ('k','22222','N/A')
RBarry's method will handle those two. The rest aren't strict duplicates. If you want to treat:
insert into SC values ('Bank','22222','N/A')
insert into SC values ('Bank','22222','Here in town')
as duplicates (for example), you'll have to define what constitutes a "duplicate", and how to determine what to do with the slightly different data.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 29, 2008 at 5:27 pm
you can use the group by statement to find your dupes and based on the group by, write a delete statement to nuke it.
you can also use the OUTPUT function to output your deletes to a temp table so that you can later review. Makes it easier to recover data if your code fubarred.
-----------------------------
www.cbtr.net
.: SQL Backup Admin Tool[/url] :.
April 29, 2008 at 8:44 pm
Not if the dupes encompass the key for the group by like this one does...
--Jeff Moden
Change is inevitable... Change for the better is not.
April 30, 2008 at 11:33 am
Another option...
delete a from
(select *, row_number() over (partition by CityName, Zip, Description order by CityName) as x from SC) a
where x > 1
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
April 30, 2008 at 1:21 pm
The problem with the phrase "Remove Duplicates" is that the speaker *usually* means is "one of them is 'real', so the others that are like it are the duplicates, so I want you to remove the (additional) duplicates and leave the original". This is confusing because in both SQL and Logic, they are ALL duplicates.
Which means that the obvious {DELETE .. Where Group By Count > 1} won't work because that also deletes the "originals" along with the "duplicates".
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2008 at 10:57 pm
rbarryyoung (4/30/2008)
The problem with the phrase "Remove Duplicates" is that the speaker *usually* means is "one of them is 'real', so the others that are like it are the duplicates, so I want you to remove the (additional) duplicates and leave the original". This is confusing because in both SQL and Logic, they are ALL duplicates.Which means that the obvious {DELETE .. Where Group By Count > 1} won't work because that also deletes the "originals" along with the "duplicates".
Is this aimed at my suggestion?
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
April 30, 2008 at 11:43 pm
RyanRandall (4/30/2008)
rbarryyoung (4/30/2008)
...Is this aimed at my suggestion?
Oberion's, I think.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
April 30, 2008 at 11:46 pm
rbarryyoung (4/30/2008)
RyanRandall (4/30/2008)
rbarryyoung (4/30/2008)
...Is this aimed at my suggestion?
Oberion's, I think.
OK - just checking 🙂
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
May 1, 2008 at 8:57 am
Add a column that creates a unique id per row. Select the minimum value into a table for this new column based on the columns that will make the row unique. Delete the rest of the rows doing an inner join to the minimum value table.
Alter table table1
add newcol int identity(1,1)
Select col1, col2, col3, ..., min(newcol) as minvalues
into #temp
from table1
group by col1, col2, col3, ...
Delete from table1
from #temp
where #temp.minvalues <> table1.newcol
and #temp.col1 = table1.col1
and #temp.col2 = table1.col2
and #temp.col3 = table1.col3 ...
Q
Please take a number. Now serving emergency 1,203,894
May 1, 2008 at 9:02 am
Q (5/1/2008)
Add a column that creates a unique id per row. Select the minimum value into a table for this new column based on the columns that will make the row unique. Delete the rest of the rows doing an inner join to the minimum value table.Alter table table1
add newcol int identity(1,1)
Select col1, col2, col3, ..., min(newcol) as minvalues
into #temp
from table1
group by col1, col2, col3, ...
Delete from table1
from #temp
where #temp.minvalues <> table1.newcol
and #temp.col1 = table1.col1
and #temp.col2 = table1.col2
and #temp.col3 = table1.col3 ...
Why do all that when you can just do this? Is there some advantage that you know of?
delete a from
(select *, row_number() over (partition by CityName, Zip, Description order by CityName) as x from SC) a
where x > 1
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
May 1, 2008 at 10:12 am
Ryan,
I like your solution. I am not sure that the different solution has an advantage. It was just my preference. I have been using similar code for years for others to perform this task. It is simple, sets up a column so every row now is unique, and someone with very little T-sql can read it.
Yours is simple and does not require a physical change to the table.
Q
Please take a number. Now serving emergency 1,203,894
May 1, 2008 at 10:16 am
Okay - thanks. I was just checking I wasn't missing something 🙂
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply