July 6, 2010 at 7:57 am
Hi
I have a source that gives me data in sequence like this
Account DEPT Employee TASK
A1 D1 E1 T1
A1 D1 E1 T2
A1 D2 E1 T3
A1 D2 E1 T3
A1 D2 E1 T3
A2 D3 E3 T1
A2 D4 E3 T2
A1 D2 E1 T3
A1 D2 E2 T3
now I have to follow this sequence to remove duplicates in a set of account data like i have to only remove duplicates in bold area...
but if I sort them then they may remove my underline row also...
so is there any way to remove duplicates taking set by account numbers...and not sorting whole data...
hope you got the question..
any help...
Thanks [/font]
July 6, 2010 at 8:02 am
How do you define the word 'duplicate' for your data?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 6, 2010 at 8:05 am
Priya,
You may use windowing functions(ROW_NUMBER,RANK,DENSE_RANK)
But this is a bad database design...I do not see a primary key constraint employed
Raunak J
July 6, 2010 at 8:05 am
Does duplicate mean they have to be next to eachother in the source?
July 6, 2010 at 8:18 am
You could use the Script Component for this. It would examine each row and compare it to the values of the previous row to determine if it's a dup. You would define an additional output column to store a duplicate flag. Then you could use the Conditional Split to redirect the duplicate rows based upon the duplicate flag.
July 6, 2010 at 8:20 am
Here is my sample data, hope this will make it clear...
23350236800003Jun 29 2010 112885AGRINPT
is duplicate here
but the same data may come again after a break
CREATE TABLE [dbo].[test3](
[Acntno] [nvarchar](14) NULL,
[date1] [varchar](13) NULL,
[Emplno] [nvarchar](5) NULL,
[Type] [nvarchar](3) NULL,
[ACtion] [nvarchar](4) NULL
)
After sorting I should get this result:
Thanks [/font]
July 6, 2010 at 8:26 am
If the data might 'come again' after a 'break', why would you want to retain it if it is identical to what has gone before?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 6, 2010 at 8:29 am
Thats the requirement that is giving me a problem to sort it... and remove duplicates form it easily...
I need to trace every row from SRC as it comes one by one......but only continuous rows I have to remove duplicates..
Thanks [/font]
July 6, 2010 at 8:53 am
Here is a SQL approach using a self joined temp table:
CREATE TABLE [dbo].[#test3](
[Acntno] [nvarchar](14) NULL,
[date1] [varchar](13) NULL,
[Emplno] [nvarchar](5) NULL,
[Type] [nvarchar](3) NULL,
[ACtion] [nvarchar](4) NULL,
[RecID] int not null identity
)
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')
select b.*
from #test3 a
inner join #test3 b
on a.recid = b.recid - 1
where b.recid = 1 -- always use first record
OR NOT
( a.Acntno = b.Acntno
AND a.date1 = b.date1
AND a.emplno = b.emplno
AND a.type = b.type
AND a.action = b.action
)
ORDER BY b.RecID
drop table #test3
July 6, 2010 at 8:58 am
Wow.......
Hey Thanks for quick reply, gr8 code..
Your code works like a charm....
but it is missing the first row(in bold), any idea why is it behaving like this:
Thank you again...
Thanks [/font]
July 6, 2010 at 9:03 am
It's this line in my sample code:
"where b.recid = 1 -- always use first record"
July 6, 2010 at 9:11 am
Oops, sorry, that first row is excluded by the INNER JOIN. Change that to a RIGHT JOIN and it will appear.
July 6, 2010 at 9:11 am
Thanks a lot Ed....great help
It works I changed inner join to Right join
select b.*
from #test3 a
right join #test3 b
on a.recid = b.recid - 1
where b.recid = 1 -- always use first record
OR NOT
( a.Acntno = b.Acntno
AND a.date1 = b.date1
AND a.emplno = b.emplno
AND a.type = b.type
AND a.action = b.action
)
ORDER BY b.RecID
Thanks [/font]
July 6, 2010 at 9:47 am
Hi,
Now there is a little more into this problem, now can I remove duplicates for that account num range before account number changes.....even if it is not in order...
before the break comes by account num can I exclude duplicates for an account
like here in the insert i added one more duplicate row after 1st record like this
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
and the result changed....
actually the requirement now is to remove duplicates in that account no range rows,even if it is not a consequtive record in the dataset,( one row after the other),
....
can I make the result look like removing the bold lines in result as they are duplicates for an account sequence, i.e. removed duplicates for an account even if not together...before other account no comes..
any help on this?
I hope it is clear..
Thanks [/font]
July 6, 2010 at 11:39 am
you can do....
1. take distinct records into temptable.
2. truncate original table
3. store temptable data into original table.
RB
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply