Just For Fun: An Impossible Delete

Question

Just For Fun: An Impossible Delete

Viewing 15 posts - 136 through 150 (of 156 total)

You must be logged in to reply to this topic. Login to reply

Thomas-282729 Right there with Babe Points: 715 More actions · Answer 1

Aaron Cabrele (10/31/2009)
It says no additional identity column so that doesn't outrule the hidden one that already exists... As for adding a guid it depends on how strictly you say what is an identity column - is that simply a column with the identity property, therefor any other column like uniqueidentifier can be added?...!
I love the way, once most or all possibilities are exhausted, we try and re-interpret the question, bend
the rules...

LOL. Well, the objective is to find the simplest solution within the rules given. As with any game or puzzle, a large part of the solution is knowing what the boundaries really are.

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 2

Barry, someone sent me this link again, although I'd read it the last time around, and your description of the problem got me to reminiscing. It takes me back to the days of fixed file extents on disk. Probably the primary reason why you had to do an update in place was that the rest of the disk was spoken for. If things were really tight though, you might even have written a loop to de-dupe one set of identical rows at a time. There were so many hardware limitations to work around, and I'd all but forgotten them. Thanks for taking me back to the "bad old days".

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

JLSSCH Ten Centuries Points: 1013 More actions · Answer 3

Maybe I'm missing something, but the query the original poster was looking for is simple with the appropriate GROUP BY and HAVING clauses.

SELECT Name, Age, Sex

FROM Source

GROUP BY Name, Age, Sex

HAVING COUNT(*) > 1

This ensures that the output from the query includes only a row for duplicate rows. Note no row will be output for a unique row.

parody Say Hey Kid Points: 682 More actions · Answer 4

This would select (or delete) everything that is duplicated. The requirement was to delete only the first occurrence of a duplicate, or the only occurrence of any unique rows.

The "first" is obviously irrelevant given there is no order and they are duplicates so I read that as "one of"

JLSSCH (11/1/2009)
Maybe I'm missing something, but the query the original poster was looking for is simple with the appropriate GROUP BY and HAVING clauses.
SELECT Name, Age, Sex
FROM Source
GROUP BY Name, Age, Sex
HAVING COUNT(*) > 1
This ensures that the output from the query includes only a row for duplicate rows. Note no row will be output for a unique row.

Stephen.Richardson Right there with Babe Points: 758 More actions · Answer 5

I have not seen this solution in the above list. Not fancy, but performs the initial request to delete duplicate rows leaving the initial entry, with minimal resource impact (especialy disk space) and no cursors, additional table/column, while not modifying the data?????:

DECLARE @NM VARCHAR(50), @Age TINYINT, @Sex CHAR(1);

SELECT @NM=Name, @Age = Age, @Sex = Sex

FROM dbo.xSource

GROUP BY Name, Age, Sex

HAVING COUNT(*) > 1

WHILE @NM IS NOT NULL

BEGIN

DELETE TOP (1)

FROM dbo.xSource

WHERE Name = @NM AND Age = @Age AND Sex = @Sex

SELECT @NM = NULL

SELECT @NM=Name, @Age = Age, @Sex = Sex

FROM dbo.xSource

GROUP BY Name, Age, Sex

HAVING COUNT(*) > 1

END

JacekO SSCertifiable Points: 6610 More actions · Answer 6

Great article. I really liked the 'go back to the 70' part.

Sometimes when I see how stuff is done just because the computer can handle the load it drives me nuts.

---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]

Anipaul SSC-Insane Points: 24681 More actions · Answer 7

Anipaul

SSC-Insane

Points: 24681

November 2, 2009 at 9:28 am

#1073869

Great one....

JLSSCH Ten Centuries Points: 1013 More actions · Answer 8

Thanks for your comments. The following modified solution works when you use the row_number() function to number the rows in an output query.

For the following query, the output for "row_number() over (order by name) as ident" will produce a column with the name of "ident" and a monotonically increasing integer value starting with 1. This is functionally the same as an identity column with a seed of 1!

select name, age, sex, row_number() over (order by name) as ident

from #temp

Here is the complete script:

create table #temp

(

name varchar(4),

age int,

sex varchar(1)

)

insert #temp

select 'ABC',24,'M'

union all

select 'ABC',24,'M'

union all

select 'LMN', 27, 'M'

union all

select 'LMN', 27, 'M'

union all

select 'LMN', 27, 'M'

union all

select 'PQRS', 25, 'F'

union all

select 'XYZ', 24, 'M'

union all

select 'XYZ', 25, 'M'

select a.name, a.age, a.sex, a.ident

from

(

select name, age, sex, row_number() over (order by name) as ident

from #temp

) a

inner join

(

select b.name, b.age, b.sex, min(b.ident) as ident

from

(

select name, age, sex, row_number() over (order by name) as ident

from #temp

) b

group by b.name, b.age, b.sex

having count(*) > 1

) c on a.name = c.name and a.age = c.age and a.sex = c.sex and a.ident > c.ident

drop table #temp

--***************************

The desired output that is obtained from executing this script is:

ABC24M2

LMN27M4

LMN27M5

I hope that this helps.

JLSSCH Ten Centuries Points: 1013 More actions · Answer 9

One final update to my solution. Using CTE's, the output query is much easier to understand.

Here is the complete script:

create table #temp

(

name varchar(4),

age int,

sex varchar(1)

)

insert #temp

select 'ABC',24,'M'

union all

select 'ABC',24,'M'

union all

select 'LMN', 27, 'M'

union all

select 'LMN', 27, 'M'

union all

select 'LMN', 27, 'M'

union all

select 'PQRS', 25, 'F'

union all

select 'XYZ', 24, 'M'

union all

select 'XYZ', 25, 'M';

with tmp

as

(

select name, age, sex, row_number() over (order by name) as ident

from #temp

)

select a.name, a.age, a.sex, a.ident

from tmp a inner join

(

select b.name, b.age, b.sex, min(b.ident) as ident

from tmp b

group by b.name, b.age, b.sex

having count(*) > 1

) c on a.name = c.name and a.age = c.age and a.sex = c.sex and a.ident > c.ident

drop table #temp

--***************************

The desired output that is obtained from executing this script is:

ABC 24 M 2

LMN 27 M 4

LMN 27 M 5

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

hmi (10/30/2009)
Good brain training but are the restrictions as mentioned realistic? Which company would definitely not want to use a temp table or table variable?

Quite right, it's really just brain training.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

honza.mf (10/30/2009)
Phil Factor (8/5/2008)
Surely this is the simplest solution? (be warned of a nasty catch when creating these 'quirky updates'. the order of execution of the update is -variable assignments first then column assignments-both left to right!)
Very nice solution.
I Just prefer little change, it's more readable
[font="Courier New"]--create the sample table
DECLARE @Sample TABLE (TheName VARCHAR(4),Identifier INT, sex VARCHAR(1))
INSERT INTO @Sample (TheName,Identifier,Sex)
SELECT 'ABC', 24, 'M' UNION ALL
SELECT 'ABC', 24,'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'PQRS', 25, 'F' UNION ALL
SELECT 'XYZ', 24, 'M' UNION ALL
SELECT 'XYZ', 25, 'M'
DECLARE @hash VARCHAR(80), @sex VARCHAR(1)
UPDATE @sample
SET
@sex=CASE WHEN COALESCE(@hash,'')
<>TheName+CONVERT(VARCHAR(5),Identifier)+sex
THEN 'd' ELSE SEX END,
@hash= TheName+CONVERT(VARCHAR(5),Identifier)+sex,
Sex = @sex
DELETE FROM @sample WHERE sex='d'
SELECT * FROM @sample[/font]

The quirky update trick isn't really reliable without a Clustered index. For instance, when I execute you code, thi sis what I get:

TheName Identifier sex

------- ----------- ----

ABC 24 M

LMN 27 M

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 12

Stephen.Richardson (10/30/2009)
...
-- Step 1 Get list of only DUPLICATE rows
INSERT INTO xSource (Name, Age, Sex)
SELECT '~'+Name, Age, Sex FROM xSource GROUP BY Name, Age, Sex Having COUNT(*) > 1

The problem is that this will fail if someone's name is already using all 50 characters (not unheard of). Otherwise I would have used this approach myself.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

skjoldtc (10/30/2009)
Thanks for the memories. A time when storage and memory was rare and expensive and you had to really know how the guts of the operating system, disk system, and memory allocation worked. Some of the old techniques are still useful today.
Great article and a great solution.

Thanks!

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

jghali (10/30/2009)
Funny how this article just came out today... I was actually on a similar thread last week and someone came up with a great solution...
My personal opinion the article is interesting but ... hmmm...
How about these solutions from this thread...
http://www.sqlservercentral.com/Forums/Topic793765-145-1.aspx
There's an undocumented identity key for every row of any table that can be used... in one simple delete statement you can delete the duplicate rows...
I was amazed...
Check it out.
Thanks

I am familiar with that thread (in fact I think that I am posted there), however, 1) it came out long after this article was originally published (15 months ago) and 2) those internal IDs are a lot hard to use than you might think, esp. given the restrictions. For instance, how would you propose using them to solve this problem?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 15

arms.dan (10/30/2009)
Your off-hand comment about naming the field "Gender" rather than "Sex" piqued my curiosity. Why? As I understand the words, sex means biological differences: chromosomes and sexual organs. Gender refers to the characteristics a society or culture delineates as masculine or feminine. There's some blurring of the definitions, but I think they generally hold. So why do you prefer Gender for the column label?

In English "Sex" can also be a verb, thus leading to the joke about filling forms where is says "Sex[_]" with "Y". The PC way to stop this is to change the field name to "Gender[_]".

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Just For Fun: An Impossible Delete

Cookies on SQLServerCentral