removing duplicates without sorting....

Question

Post reply

removing duplicates without sorting....

sql_learner29

SSCrazy Eights

Points: 9107
More actions
July 6, 2010 at 7:57 am

#145848

Hi
I have a source that gives me data in sequence like this
Account DEPT Employee TASK
A1 D1 E1 T1
A1 D1 E1 T2
A1 D2 E1 T3
A1 D2 E1 T3
A1 D2 E1 T3
A2 D3 E3 T1
A2 D4 E3 T2
A1 D2 E1 T3
A1 D2 E2 T3
now I have to follow this sequence to remove duplicates in a set of account data like i have to only remove duplicates in bold area...
but if I sort them then they may remove my underline row also...
so is there any way to remove duplicates taking set by account numbers...and not sorting whole data...
hope you got the question..
any help...
[font="Comic Sans MS"]
---------------------------------------------------
Thanks [/font]

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247224 More actions · Answer 1

How do you define the word 'duplicate' for your data?

Raunak Jhawar SSCoach Points: 15701 More actions · Answer 2

Priya,

You may use windowing functions(ROW_NUMBER,RANK,DENSE_RANK)

But this is a bad database design...I do not see a primary key constraint employed

Raunak J

Cadavre SSC-Forever Points: 41690 More actions · Answer 3

Does duplicate mean they have to be next to eachother in the source?

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 4

You could use the Script Component for this. It would examine each row and compare it to the values of the previous row to determine if it's a dup. You would define an additional output column to store a duplicate flag. Then you could use the Conditional Split to redirect the duplicate rows based upon the duplicate flag.

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 5

Here is my sample data, hope this will make it clear...

23350236800003Jun 29 2010 112885AGRINPT

is duplicate here

but the same data may come again after a break

CREATE TABLE [dbo].[test3](

[Acntno] [nvarchar](14) NULL,

[date1] [varchar](13) NULL,

[Emplno] [nvarchar](5) NULL,

[Type] [nvarchar](3) NULL,

[ACtion] [nvarchar](4) NULL

)

After sorting I should get this result:

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

Phil Parkin SSC Guru Points: 247224 More actions · Answer 6

If the data might 'come again' after a 'break', why would you want to retain it if it is identical to what has gone before?

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 7

Thats the requirement that is giving me a problem to sort it... and remove duplicates form it easily...

I need to trace every row from SRC as it comes one by one......but only continuous rows I have to remove duplicates..

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 8

Here is a SQL approach using a self joined temp table:

CREATE TABLE [dbo].[#test3](

[Acntno] [nvarchar](14) NULL,

[date1] [varchar](13) NULL,

[Emplno] [nvarchar](5) NULL,

[Type] [nvarchar](3) NULL,

[ACtion] [nvarchar](4) NULL,

[RecID] int not null identity

)

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')

select b.*

from #test3 a

inner join #test3 b

on a.recid = b.recid - 1

where b.recid = 1 -- always use first record

OR NOT

( a.Acntno = b.Acntno

AND a.date1 = b.date1

AND a.emplno = b.emplno

AND a.type = b.type

AND a.action = b.action

)

ORDER BY b.RecID

drop table #test3

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 9

Wow.......

Hey Thanks for quick reply, gr8 code..

Your code works like a charm....

but it is missing the first row(in bold), any idea why is it behaving like this:

Thank you again...

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 10

It's this line in my sample code:

"where b.recid = 1 -- always use first record"

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 11

Oops, sorry, that first row is excluded by the INNER JOIN. Change that to a RIGHT JOIN and it will appear.

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 12

Thanks a lot Ed....great help

It works I changed inner join to Right join

select b.*

from #test3 a

right join #test3 b

on a.recid = b.recid - 1

where b.recid = 1 -- always use first record

OR NOT

( a.Acntno = b.Acntno

AND a.date1 = b.date1

AND a.emplno = b.emplno

AND a.type = b.type

AND a.action = b.action

)

ORDER BY b.RecID

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

sql_learner29 SSCrazy Eights Points: 9107 More actions · Answer 13

Hi,

Now there is a little more into this problem, now can I remove duplicates for that account num range before account number changes.....even if it is not in order...

before the break comes by account num can I exclude duplicates for an account

like here in the insert i added one more duplicate row after 1st record like this

INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

and the result changed....

actually the requirement now is to remove duplicates in that account no range rows,even if it is not a consequtive record in the dataset,( one row after the other),

....

can I make the result look like removing the bold lines in result as they are duplicates for an account sequence, i.e. removed duplicates for an account even if not together...before other account no comes..

any help on this?

I hope it is clear..

[font="Comic Sans MS"]
---------------------------------------------------

Thanks [/font]

PaVeRa22 SSCommitted Points: 1965 More actions · Answer 14

you can do....

1. take distinct records into temptable.

2. truncate original table

3. store temptable data into original table.

RB