removing duplicates without sorting....

  • PaVeRa22 (7/6/2010)


    you can do....

    1. take distinct records into temptable.

    2. truncate original table

    3. store temptable data into original table.

    RB

    If you take the time to read through the entire thread, you'll see that your solution will not work for this bizarre requirement.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • yeah it won't work...

    the solution may be is to do the same kind of self join as in the solution earlier,by creating one more temp table , but here I can not join using recid = recid -1 as rec id is not consecutive everytime...

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • OK - try this. It's not the most efficient SQL, it uses a WHILE loop, so you may have to tweak it if performance is an issue. But the concept would be the same - assign a batch number at each change in account number, do a GROUP By to get the distinct items within the batch, and order by the RecID to maintain the original ordering.

    CREATE TABLE [dbo].[#test3](

    [Acntno] [nvarchar](14) NULL,

    [date1] [varchar](13) NULL,

    [Emplno] [nvarchar](5) NULL,

    [Type] [nvarchar](3) NULL,

    [ACtion] [nvarchar](4) NULL,

    [batch] int null,

    [RecID] int not null identity

    )

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')

    update #test3

    set batch = 1 where recid = 1

    DECLARE @counter int

    SET @counter = 2

    WHILE @counter <= (SELECT MAX(RecID) FROM #test3)

    BEGIN

    update b

    SET batch = CASE WHEN b.Acntno = a.Acntno THEN a.batch

    ELSE a.batch + 1 END

    from #test3 a

    right join #test3 b

    on a.recid = b.recid - 1

    where b.recid = @counter

    set @counter = @counter + 1

    END

    select Acntno,date1,Emplno,Type,ACtion,min(recid) as recid

    from #test3

    group by Acntno,date1,Emplno,Type,ACtion,batch

    order by recID

    drop table #test3

  • Hey Thanks Ed... Its really a very complex code,gr8 code, result looks good...I will have to understand it first to see how can I implement it in my actual data.

    but I have one query about you your previous solution, I was trying to do this but no succes yet, hope you can do it much fast...

    On running your previous code :

    select b.*

    from #test3 a

    right join #test3 b

    on a.recid = b.recid - 1

    where b.recid = 1 -- always use first record

    OR NOT

    ( a.Acntno = b.Acntno

    AND a.date1 = b.date1

    AND a.emplno = b.emplno

    AND a.type = b.type

    AND a.action = b.action

    )

    ORDER BY b.RecID

    I got this result :

    Now here the last column recid will give me latest recid i mean if some account is repeating 18 times it gives me 18 ...

    Is there any way that I can make the recid count shown as like this:

    this will help me as i am pivoting the data after wards based in this recid 's value...

    any help on this...let me know if more information is needed

    thanks

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Replace the final SELECT query with this query. It uses a CTE (Common Table Expression) and the ROW_NUMBER function.

    WITH NoDuplicates AS

    (

    select Acntno,date1,Emplno,Type,ACtion, min(recid) as recid, batch

    from #test3

    group by Acntno,date1,Emplno,Type,ACtion,batch

    )

    SELECT Acntno,date1,Emplno,Type,ACtion,

    ROW_NUMBER() OVER(PARTITION BY batch ORDER BY recid)

    FROM NoDuplicates

  • Hey, thanks for the reply but The result is not coming if I run this query....

    the min(recid ) will always be 1

    ;

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • You have to include the WHILE loop to update the temp table with a new "batch" number at each break in the account number.

    CREATE TABLE [dbo].[#test3](

    [Acntno] [nvarchar](14) NULL,

    [date1] [varchar](13) NULL,

    [Emplno] [nvarchar](5) NULL,

    [Type] [nvarchar](3) NULL,

    [ACtion] [nvarchar](4) NULL,

    [batch] int null,

    [RecID] int not null identity

    )

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')

    update #test3

    set batch = 1 where recid = 1

    DECLARE @counter int

    SET @counter = 2

    WHILE @counter <= (SELECT MAX(RecID) FROM #test3)

    BEGIN

    update b

    SET batch = CASE WHEN b.Acntno = a.Acntno THEN a.batch

    ELSE a.batch + 1 END

    from #test3 a

    right join #test3 b

    on a.recid = b.recid - 1

    where b.recid = @counter

    set @counter = @counter + 1

    END

    WITH NoDuplicates AS

    (

    select Acntno,date1,Emplno,Type,ACtion, min(recid) as recid, batch

    from #test3

    group by Acntno,date1,Emplno,Type,ACtion,batch

    )

    SELECT Acntno,date1,Emplno,Type,ACtion,

    ROW_NUMBER() OVER(PARTITION BY batch ORDER BY recid)

    FROM NoDuplicates

    drop table #test3

  • Phil Parkin (7/6/2010)


    PaVeRa22 (7/6/2010)


    you can do....

    1. take distinct records into temptable.

    2. truncate original table

    3. store temptable data into original table.

    RB

    If you take the time to read through the entire thread, you'll see that your solution will not work for this bizarre requirement.

    I'm sorry, I’m still not clear,

    as per knowledge, they wanted to delete the records 2nd occurrence onwards... for every distinct record...

    so, how does we know which record is deleted if we use distinct, use distinct and then number it...

  • priya__ (7/6/2010)


    Hi,

    Now there is a little more into this problem, now can I remove duplicates for that account num range before account number changes.....even if it is not in order...

    before the break comes by account num can I exclude duplicates for an account

    like here in the insert i added one more duplicate row after 1st record like this

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    and the result changed....

    actually the requirement now is to remove duplicates in that account no range rows,even if it is not a consequtive record in the dataset,( one row after the other),

    ....

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')

    INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')

    the result is like this:

    23350236800003Jun 29 2010 12885TELVERF1

    23350236800003Jun 29 2010 12885PYMCOMP2

    23350236800003Jun 29 2010 12885AGRINPT3

    23350236800003Jun 29 2010 12885PYMCOMP4

    23350236800003Jun 29 2010 12885AGRINPT5

    21130403100001Jun 29 2010 22501TONDSKP7

    can I make the result look like removing the bold lines in result as they are duplicates for an account sequence, i.e. removed duplicates for an account even if not together...before other account no comes..

    any help on this?

    I hope it is clear..

    from the above result, which records you wanted to delete from 3 and 5 ?? thx!

  • Thanks a lot Ed for your help....:-)

    Your code is gr8 and is working fine....The results are correct..

    but since it has an update it is taking lot of time( i have about 50,000 rows so its not feasible for me),

    So I am using Cursors to do this...

    I added a rownumber (not identity)to all account occurences, as account changes I add 1 to the rownum too and then I selected distinct from that it is showing correct results and is fast too...

    Thanks a lot for your help...

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • priya__

    You're welcome. It was a slow day at the office and it was an interesting little exercise. I'm glad you were able to modify the SQL to better fit your requirements.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply