eliminating duplicates

Question

eliminating duplicates

Viewing 15 posts - 16 through 30 (of 137 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 1

There is a way to do this using SET ROWCOUNT 1 and a cursor (which is technically an intermediate table but fools most people). Performance will totally suck. It would be much easier if you got the people giving you these ridiculous requirements to cut you some slack... they should not be allowed anywhere near a database. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 2

...and, if you just happend to have posted in the wrong forum and you actually have SQL Server 2005, you need to let us know that because there's a very simple solution there... :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

Jeff Moden (6/21/2008)
There is a way to do this using SET ROWCOUNT 1 and a cursor (which is technically an intermediate table but fools most people). Performance will totally suck. It would be much easier if you got the people giving you these ridiculous requirements to cut you some slack... they should not be allowed anywhere near a database. 😉

Tak, tsk Jeff. This can be done in SQL2000 without any loops or cursors. All you need is a Tally table and some convoluted compression/decompression tricks.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 4

Tally table method in 2000 for this would require a triangular join... I believe. If you know a way to do this in 2k without using a loop but using a Tally table instead, I'd sure like to see it. And, remember, OP said could not use an intermediate table or add any columns. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

Jeff Moden (6/21/2008)
Tally table method in 2000 for this would require a triangular join... I believe. If you know a way to do this in 2k without using a loop but using a Tally table instead, I'd sure like to see it. And, remember, OP said could not use an intermediate table or add any columns. 🙂

Yep. Technically Triangular, but only over the count of duplicates-1. Given the other (extreme) constraints, I think that's a fair compromise.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 6

Vish:

vish (6/19/2008)
I use SQL server 2000.
I have a table 'TAB' with 3 coulumns 'Name','Age','Sex'
and there is no primary key.
I have multiple duplicate data in my table. for eg..
ABC24M
ABC24M
LMN27M
LMN27M
LMN27M
PQRS25F
XYZ24M
XYZ25M
Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)
and keep the other duplcates from above data. 😎
Now, If there is a row without any dupicates present.. it will be deleted ( for eg Row no 6)
The condition is i dont want to go for intermediate tables or have any
additional identity column. :w00t:

Yes, according the the Relational rules of tuple identity, this can absolutely be done in SQL2000. It's a little obtuse, but not nearly as obtuse as your conditions.

Try this:

--Compress duplicates and

--encode them into archetype records:

Insert into TAB

Select Name

, Age

, CHAR( (2*Count(*)) + (CASE Sex When 'M' Then 0 Else 1 End) )

From TAB

Group By Name, Age, Sex

--remove the old leftover records:

Delete from TAB

Where NOT Sex IN('F','M')

--Now Uncompress and restore

--the original records skipping the 1st:

Insert into TAB

Select Name

, Age

, CASE (ASCII(Sex) && 1) When 0 Then 'M' Else 'F' End

From TAB

Join Tally ON Tally.Number <= CASE Ascii(Sex)/2

Where Tally.Number Between 2 and 127

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 7

Oops, sorry. Forgot one of the steps:

--pre-compress the records:

Update TAB

Set Sex = CHAR(CASE Sex When 'M' Then 0 Else 1 End)

--Compress duplicates together and

--encode them into archetype records:

Insert into TAB

Select Name

, Age

, CHAR( (2*Count(*)) + Ascii(Sex) )

From TAB

Group By Name, Age, Sex

--remove the old leftover records:

-- and the archetypes with a count=1

Delete from TAB

Where Ascii(Sex) <= 3

--Now Uncompress and restore

--the original records skipping the 1st:

Insert into TAB

Select Name

, Age

, CASE (ASCII(Sex) & 1) When 0 Then 'M' Else 'F' End

From TAB

Join Tally ON Tally.Number <= CASE Ascii(Sex)/2

Where Tally.Number Between 2 and 127

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 8

Oh, very clever... well done, Barry!

Only corrections are to remove the CASE in the final insert and to delete the compressed rows...

Other than that, very, very clever... 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 9

Heh... still, the best thing to do would be to give the person requesting that this be done without adding an IDENTITY column and without using an intermediate table a really bad case of porkchop-itus. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

Jeff Moden (6/21/2008)
Oh, very clever... well done, Barry!
Only corrections are to remove the CASE in the final insert and to delete the compressed rows...
Other than that, very, very clever... 🙂

Dang! You're right.

Oh well, as we used to say in theoretical mathematics: "I leave that as an exercise for the reader." 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

Jeff Moden (6/21/2008)
Heh... still, the best thing to do would be to give the person requesting that this be done without adding an IDENTITY column and without using an intermediate table a really bad case of porkchop-itus. 🙂

No argument there.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

miquelfrances Valued Member Points: 63 More actions · Answer 12

Hi.

One of solutions I apply in these cases is.

1. alter table and create a new identity column..

2. Make a select group by items, and get min(identity),

3. delete all records in table not in subselect (2)

4. Alter table and drop column.

Here you have and example:

Table Test

(

id int,

name char

)

You have the values:

1 AAA

2 BBB

3 CCC

- First create a new identity column pos order

ALTER TABLE test ADD [POS][int] IDENTITY(1,1)

Now whe have on table this

1 AAA 1

1 AAA 2

2 BBB 3

3 CCC 4

4 CCC 4

- Now drop (duplicated 2 o more times) values, group by columns values (in this case id,name)

DELETE FROM test WHERE POS NOT IN(SELECT min(POS) FROM test group by id,name)

- Drop new column

ALTER table test DROP COLUMN POS

That's all folks!!!!

janine.rawnsley Ten Centuries Points: 1056 More actions · Answer 13

vish (6/19/2008)
Hi All,
Let me give my problem description in detail.
I use SQL server 2000.
I have a table 'TAB' with 3 coulumns 'Name','Age','Sex'
and there is no primary key.
I have multiple duplicate data in my table. for eg..
ABC24M
ABC24M
LMN27M
LMN27M
LMN27M
PQRS25F
XYZ24M
XYZ25M
Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)
and keep the other duplcates from above data. 😎
Now, If there is a row without any dupicates present.. it will be deleted ( for eg Row no 6)
The condition is i dont want to go for intermediate tables or have any
additional identity column. :w00t:
Please help..!!
Thanks folks

Pretty sure I can write something to do this actually.

Weird request though! Can I just check, you would actually delete rows 7 and 8 up there, because the age is diff thus the rows are unique and according to yuor request, unique rows are to be removed?

janine.rawnsley Ten Centuries Points: 1056 More actions · Answer 14

OK Assuming your rows are unique only when all three rows match (this includes gender), this should work..

DECLARE @testing TABLE

(

FirstColvarChar(5),

SecondColINT,

ThirdColChar(1)

)

INSERT INTO @testing

SELECT 'ABC',24,'M'

INSERT INTO @testing

SELECT 'ABC',24,'M'

INSERT INTO @testing

SELECT 'DEF',24,'M'

INSERT INTO @testing

SELECT 'DEF',24,'F'

INSERT INTO @testing

SELECT 'GHI',26,'F'

INSERT INTO @testing

SELECT 'GHI',26,'F'

INSERT INTO @testing

SELECT 'GHI',26,'F'

INSERT INTO @testing

SELECT 'GHI',26,'F'

INSERT INTO @testing

SELECT 'GHI',26,'F'

INSERT INTO @testing

SELECT 'LMN',27,'M'

INSERT INTO @testing

SELECT 'LMN',27,'M'

INSERT INTO @testing

SELECT 'LMN',27,'M'

INSERT INTO @testing

SELECT 'PQRS',25,'F'

INSERT INTO @testing

SELECT 'XYZ',24,'M'

INSERT INTO @testing

SELECT 'XYZ',25,'M'

SELECT * FROM @testing

-- Delete unique rows

DELETE T1 FROM @testing T1

INNER JOIN (SELECT FirstCol,SecondCol,ThirdCol FROM @testing GROUP BY FirstCol,SecondCol,ThirdCol HAVING COUNT(*) = 1) T2

ON T1.FirstCol = T2.FirstCol

AND T1.SecondCol = T2.SecondCol

AND T1.ThirdCol = T2.ThirdCol

SELECT * FROM @testing

-- Group up and count columns

DECLARE@TestCount TABLE

(

FirstColvarChar(5),

SecondColINT,

ThirdColChar(1),

CountColINT

)

INSERT INTO@TestCount

SELECTFirstCol,SecondCol,ThirdCol,COUNT(*) FROM @testing

GROUP BY FirstCol,SecondCol,ThirdCol

-- Get max count

DECLARE@MaxCountINT

SELECT@MaxCount = MAX(CountCol) FROM @TestCount

-- Final table to hold last result set

DECLARE @TestingFinal TABLE

(

FirstColvarChar(5),

SecondColINT,

ThirdColChar(1)

)

-- If there are two entries insert 1, 3 insert 2, 4 insert 3 etc.

-- Thus removign "first original row"

DECLARE@CurCountINT

SELECT@CurCount = MIN(CountCol) FROM @TestCount

WHILE@CurCount <= @MaxCount

BEGIN

-- Insert

INSERT INTO@TestingFinal

SELECTFirstCol,SecondCol,ThirdCol

FROM@TestCount

WHERECountCol >= @CurCount

-- Increment

SET@CurCount = @CurCount + 1

END

-- Done

SELECT * FROM @TestingFinal

ORDER BY FirstCol,SecondCol,ThirdCol

*hides from the RBAR police*

Jeff Moden SSC Guru Points: 1004304 More actions · Answer 15

*hides from the RBAR police*

Heh.. nah... "we" see all RBAR 😛

The problem with even that solution is that the op said "no intermediate tables"... @TestCount qualifies as an intermediate table...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)