July 6, 2010 at 11:54 am
PaVeRa22 (7/6/2010)
you can do....1. take distinct records into temptable.
2. truncate original table
3. store temptable data into original table.
RB
If you take the time to read through the entire thread, you'll see that your solution will not work for this bizarre requirement.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 6, 2010 at 12:07 pm
yeah it won't work...
the solution may be is to do the same kind of self join as in the solution earlier,by creating one more temp table , but here I can not join using recid = recid -1 as rec id is not consecutive everytime...
Thanks [/font]
July 6, 2010 at 12:47 pm
OK - try this. It's not the most efficient SQL, it uses a WHILE loop, so you may have to tweak it if performance is an issue. But the concept would be the same - assign a batch number at each change in account number, do a GROUP By to get the distinct items within the batch, and order by the RecID to maintain the original ordering.
CREATE TABLE [dbo].[#test3](
[Acntno] [nvarchar](14) NULL,
[date1] [varchar](13) NULL,
[Emplno] [nvarchar](5) NULL,
[Type] [nvarchar](3) NULL,
[ACtion] [nvarchar](4) NULL,
[batch] int null,
[RecID] int not null identity
)
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')
update #test3
set batch = 1 where recid = 1
DECLARE @counter int
SET @counter = 2
WHILE @counter <= (SELECT MAX(RecID) FROM #test3)
BEGIN
update b
SET batch = CASE WHEN b.Acntno = a.Acntno THEN a.batch
ELSE a.batch + 1 END
from #test3 a
right join #test3 b
on a.recid = b.recid - 1
where b.recid = @counter
set @counter = @counter + 1
END
select Acntno,date1,Emplno,Type,ACtion,min(recid) as recid
from #test3
group by Acntno,date1,Emplno,Type,ACtion,batch
order by recID
drop table #test3
July 6, 2010 at 1:35 pm
Hey Thanks Ed... Its really a very complex code,gr8 code, result looks good...I will have to understand it first to see how can I implement it in my actual data.
but I have one query about you your previous solution, I was trying to do this but no succes yet, hope you can do it much fast...
On running your previous code :
select b.*
from #test3 a
right join #test3 b
on a.recid = b.recid - 1
where b.recid = 1 -- always use first record
OR NOT
( a.Acntno = b.Acntno
AND a.date1 = b.date1
AND a.emplno = b.emplno
AND a.type = b.type
AND a.action = b.action
)
ORDER BY b.RecID
I got this result :
Now here the last column recid will give me latest recid i mean if some account is repeating 18 times it gives me 18 ...
Is there any way that I can make the recid count shown as like this:
this will help me as i am pivoting the data after wards based in this recid 's value...
any help on this...let me know if more information is needed
thanks
Thanks [/font]
July 6, 2010 at 2:01 pm
Replace the final SELECT query with this query. It uses a CTE (Common Table Expression) and the ROW_NUMBER function.
WITH NoDuplicates AS
(
select Acntno,date1,Emplno,Type,ACtion, min(recid) as recid, batch
from #test3
group by Acntno,date1,Emplno,Type,ACtion,batch
)
SELECT Acntno,date1,Emplno,Type,ACtion,
ROW_NUMBER() OVER(PARTITION BY batch ORDER BY recid)
FROM NoDuplicates
July 6, 2010 at 2:22 pm
Hey, thanks for the reply but The result is not coming if I run this query....
the min(recid ) will always be 1
;
Thanks [/font]
July 6, 2010 at 2:31 pm
You have to include the WHILE loop to update the temp table with a new "batch" number at each break in the account number.
CREATE TABLE [dbo].[#test3](
[Acntno] [nvarchar](14) NULL,
[date1] [varchar](13) NULL,
[Emplno] [nvarchar](5) NULL,
[Type] [nvarchar](3) NULL,
[ACtion] [nvarchar](4) NULL,
[batch] int null,
[RecID] int not null identity
)
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')
update #test3
set batch = 1 where recid = 1
DECLARE @counter int
SET @counter = 2
WHILE @counter <= (SELECT MAX(RecID) FROM #test3)
BEGIN
update b
SET batch = CASE WHEN b.Acntno = a.Acntno THEN a.batch
ELSE a.batch + 1 END
from #test3 a
right join #test3 b
on a.recid = b.recid - 1
where b.recid = @counter
set @counter = @counter + 1
END
WITH NoDuplicates AS
(
select Acntno,date1,Emplno,Type,ACtion, min(recid) as recid, batch
from #test3
group by Acntno,date1,Emplno,Type,ACtion,batch
)
SELECT Acntno,date1,Emplno,Type,ACtion,
ROW_NUMBER() OVER(PARTITION BY batch ORDER BY recid)
FROM NoDuplicates
drop table #test3
July 6, 2010 at 10:57 pm
Phil Parkin (7/6/2010)
PaVeRa22 (7/6/2010)
you can do....1. take distinct records into temptable.
2. truncate original table
3. store temptable data into original table.
RB
If you take the time to read through the entire thread, you'll see that your solution will not work for this bizarre requirement.
I'm sorry, I’m still not clear,
as per knowledge, they wanted to delete the records 2nd occurrence onwards... for every distinct record...
so, how does we know which record is deleted if we use distinct, use distinct and then number it...
July 6, 2010 at 11:05 pm
priya__ (7/6/2010)
Hi,Now there is a little more into this problem, now can I remove duplicates for that account num range before account number changes.....even if it is not in order...
before the break comes by account num can I exclude duplicates for an account
like here in the insert i added one more duplicate row after 1st record like this
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
and the result changed....
actually the requirement now is to remove duplicates in that account no range rows,even if it is not a consequtive record in the dataset,( one row after the other),
....
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'DSKP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'TON', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '21130403100001', 'Jun 29 2010 ', '22501', 'NTE', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'TEL', 'VERF')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PYM', 'COMP')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'AGR', 'INPT')
INSERT INTO #test3 ( Acntno,date1,Emplno,Type,ACtion) VALUES ( '23350236800003', 'Jun 29 2010 ', '12885', 'PGR', 'INPT')
the result is like this:
23350236800003Jun 29 2010 12885TELVERF1
23350236800003Jun 29 2010 12885PYMCOMP2
23350236800003Jun 29 2010 12885AGRINPT3
23350236800003Jun 29 2010 12885PYMCOMP4
23350236800003Jun 29 2010 12885AGRINPT5
21130403100001Jun 29 2010 22501TONDSKP7
can I make the result look like removing the bold lines in result as they are duplicates for an account sequence, i.e. removed duplicates for an account even if not together...before other account no comes..
any help on this?
I hope it is clear..
from the above result, which records you wanted to delete from 3 and 5 ?? thx!
July 7, 2010 at 6:57 am
Thanks a lot Ed for your help....:-)
Your code is gr8 and is working fine....The results are correct..
but since it has an update it is taking lot of time( i have about 50,000 rows so its not feasible for me),
So I am using Cursors to do this...
I added a rownumber (not identity)to all account occurences, as account changes I add 1 to the rownum too and then I selected distinct from that it is showing correct results and is fast too...
Thanks a lot for your help...
Thanks [/font]
July 7, 2010 at 7:07 am
priya__
You're welcome. It was a slow day at the office and it was an interesting little exercise. I'm glad you were able to modify the SQL to better fit your requirements.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply