removing duplicates without sorting....

  • Hi

    I have a source that gives me data in sequence like this

    Account DEPT Employee TASK

    A1 D1 E1 T1

    A1 D1 E1 T2

    A1 D2 E1 T3

    A1 D2 E1 T3

    A1 D2 E1 T3

    A2 D3 E3 T1

    A2 D4 E3 T2

    A1 D2 E1 T3

    A1 D2 E2 T3

    now I have to follow this sequence to remove duplicates in a set of account data like i have to only remove duplicates in bold area...

    but if I sort them then they may remove my underline row also...

    so is there any way to remove duplicates taking set by account numbers...and not sorting whole data...

    hope you got the question..

    any help...

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • How do you define the word 'duplicate' for your data?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Priya,

    You may use windowing functions(ROW_NUMBER,RANK,DENSE_RANK)

    But this is a bad database design...I do not see a primary key constraint employed

    Raunak J

  • Does duplicate mean they have to be next to eachother in the source?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • You could use the Script Component for this. It would examine each row and compare it to the values of the previous row to determine if it's a dup. You would define an additional output column to store a duplicate flag. Then you could use the Conditional Split to redirect the duplicate rows based upon the duplicate flag.

  • Here is my sample data, hope this will make it clear...

    23350236800003Jun 29 2010 112885AGRINPT

    is duplicate here

    but the same data may come again after a break

    CREATE TABLE [dbo].[test3](

    [Acntno] [nvarchar](14) NULL,

    [date1] [varchar](13) NULL,

    [Emplno] [nvarchar](5) NULL,

    [Type] [nvarchar](3) NULL,

    [ACtion] [nvarchar](4) NULL

    )

    After sorting I should get this result:

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • If the data might 'come again' after a 'break', why would you want to retain it if it is identical to what has gone before?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thats the requirement that is giving me a problem to sort it... and remove duplicates form it easily...

    I need to trace every row from SRC as it comes one by one......but only continuous rows I have to remove duplicates..

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Here is a SQL approach using a self joined temp table:

    CREATE TABLE [dbo].[#test3](

    [Acntno] [nvarchar](14) NULL,

    [date1] [varchar](13) NULL,

    [Emplno] [nvarchar](5) NULL,

    [Type] [nvarchar](3) NULL,

    [ACtion] [nvarchar](4) NULL,

    [RecID] int not null identity

    )

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')

    select b.*

    from #test3 a

    inner join #test3 b

    on a.recid = b.recid - 1

    where b.recid = 1 -- always use first record

    OR NOT

    ( a.Acntno = b.Acntno

    AND a.date1 = b.date1

    AND a.emplno = b.emplno

    AND a.type = b.type

    AND a.action = b.action

    )

    ORDER BY b.RecID

    drop table #test3

  • Wow.......

    Hey Thanks for quick reply, gr8 code..

    Your code works like a charm....

    but it is missing the first row(in bold), any idea why is it behaving like this:

    Thank you again...

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • It's this line in my sample code:

    "where b.recid = 1 -- always use first record"

  • Oops, sorry, that first row is excluded by the INNER JOIN. Change that to a RIGHT JOIN and it will appear.

  • Thanks a lot Ed....great help

    It works I changed inner join to Right join

    select b.*

    from #test3 a

    right join #test3 b

    on a.recid = b.recid - 1

    where b.recid = 1 -- always use first record

    OR NOT

    ( a.Acntno = b.Acntno

    AND a.date1 = b.date1

    AND a.emplno = b.emplno

    AND a.type = b.type

    AND a.action = b.action

    )

    ORDER BY b.RecID

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Hi,

    Now there is a little more into this problem, now can I remove duplicates for that account num range before account number changes.....even if it is not in order...

    before the break comes by account num can I exclude duplicates for an account

    like here in the insert i added one more duplicate row after 1st record like this

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    and the result changed....

    actually the requirement now is to remove duplicates in that account no range rows,even if it is not a consequtive record in the dataset,( one row after the other),

    ....

    can I make the result look like removing the bold lines in result as they are duplicates for an account sequence, i.e. removed duplicates for an account even if not together...before other account no comes..

    any help on this?

    I hope it is clear..

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • you can do....

    1. take distinct records into temptable.

    2. truncate original table

    3. store temptable data into original table.

    RB

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply