January 9, 2017 at 10:47 am
Hi,
Below is the sample data to play with and i need to use the partition by to split the record based on hour difference.
Declare @Sample table(TestDate datetime, Id int);
insert into @Sample
SELECT '2017-01-07 07:28:01.917' , 1
UNION ALL
SELECT '2017-01-07 07:28:05.210' , 1
UNION ALL
SELECT '2017-01-07 14:06:21.840' , 1
UNION ALL
SELECT '2017-01-07 14:06:24.227' , 1
UNION ALL
SELECT '2017-01-07 07:28:03.023' , 1
UNION ALL
SELECT '2017-01-07 14:06:22.440' , 1
UNION ALL
SELECT '2017-01-07 07:28:07.477' , 1
UNION ALL
SELECT '2017-01-07 14:06:28.180' , 1'
TestDate RN
2017-01-07 07:28:01.917 1
2017-01-07 07:28:05.210 1
2017-01-07 14:06:21.840 2
2017-01-07 14:06:24.227 2
2017-01-07 07:28:03.023 1
2017-01-07 14:06:22.440 2
2017-01-07 07:28:07.477 1
2017-01-07 14:06:28.180 2
Any sample query to achieve this please
January 9, 2017 at 12:04 pm
Maybe something like this:
SELECT *,
DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,
DENSE_RANK() OVER(ORDER BY DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN
FROM @Sample;
January 9, 2017 at 12:31 pm
Hi Luis,
thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.
select * from @Sample order by TestDate
please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.
in my case below are the duplicates
2017-01-07 14:06:21.840
2017-01-07 14:06:22.440
2017-01-07 14:06:24.227
2017-01-07 14:06:28.180
how do i get this duplicate records based on same hour data. any sugestion please
Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well
January 9, 2017 at 1:40 pm
What´s the problem with the code I posted?
January 9, 2017 at 1:45 pm
as like my sample i have 100*1000 records in my table and dense rank puts incremental number. basically i need the query to delete the duplicate records. any help please
January 9, 2017 at 1:57 pm
I'm lost on your requirements.
January 9, 2017 at 2:10 pm
Sure, i will explain again.
I have a table and right now i have duplicates on the table. Only way i can clean the duplicates based on id, RecordInsertedDateTime.
The most of the records gets created based with in 30 minutes. so i need to find the duplicate records based on created time intraval.
Sample code:
DECLARE @Sample TABLE (
TestDate DATETIME
,Id INT
);
INSERT INTO @Sample
SELECT '2017-01-07 07:28:01.917'
,1
UNION ALL
SELECT '2017-01-07 07:28:05.210'
,1
UNION ALL
SELECT '2017-01-07 14:06:21.840'
,1
UNION ALL
SELECT '2017-01-07 14:06:24.227'
,1
UNION ALL
SELECT '2017-01-07 07:28:03.023'
,1
UNION ALL
SELECT '2017-01-07 14:06:22.440'
,1
UNION ALL
SELECT '2017-01-07 07:28:07.477'
,1
UNION ALL
SELECT '2017-01-07 14:06:28.180'
,1
UNION ALL
SELECT '2017-01-07 16:06:22.440'
,2
UNION ALL
SELECT '2017-01-07 16:28:07.477'
,2
UNION ALL
SELECT '2017-01-07 01:06:28.180'
,2
union all
SELECT '2017-01-07 01:06:30.180', 2
on my above sample the below records created with in few seconds. but i don't want to go with seconds, so i kept 1 hours as criteria. So the first records inserted on 07:28:01.917. So (07:28:01.917 + 1 hour ) is my criteria
For Id = 1
2017-01-07 07:28:01.917
2017-01-07 07:28:05.210
2017-01-07 07:28:07.477
2017-01-07 07:28:03.023
Duplicate Records:
2017-01-07 14:06:22.440
2017-01-07 14:06:28.180
2017-01-07 14:06:21.840
2017-01-07 14:06:24.227
But if you see the above duplicated records got inserted from 14:06:22.440 for the same Id 1
so need to delete these records.
SELECT *,
DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,
DENSE_RANK() OVER(ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN
FROM @Sample;
if i use Luis solution the RN has to be 1 for non duplicate record and 2 for duplicate records for the same Id but i am getting incremented RN and am struggling on removing the duplicates.
Is my requirement clear? Any help please
January 9, 2017 at 3:02 pm
How and why are they duplicates? All I see is a DateTime value and an ID. There is no real logic to state why they are duplicated just because they are in a different hour of the same day. Would they be duplicates if the date were different?
This looks like a case of missing information unless I am missing something.
January 9, 2017 at 3:28 pm
Is this more what you're looking for? This will keep the first set of records within the earliest hour for each Id on one day.
WITH TEMP_CTE AS(
SELECT *,
DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,
DENSE_RANK() OVER(PARTITION BY id, DATEADD(day, DATEDIFF(day, '2017', TestDate), '2017') ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') ASC) RN
FROM @Sample
)
DELETE FROM TEMP_CTE WHERE RN > 1
January 9, 2017 at 4:49 pm
The problem is that the definition of "duplicate" here is not well defined. Specifically, you want to count times as duplicates if they fall within a specific threshold of time. For convenience sake, let's say an hour is the threshold.
Say we having the following events and times
event_id time
1 12:00
2 12:45
3 13:30
Clearly 1 and 2 fall within the threshold, so they should be duplicates, and 2 and 3 fall within the threshold, so they should be duplicates, but 1 and 3 fall outside of the threshold, so they should not be duplicates, but they are both duplicates of 2, so they should be duplicates.
So the question is whether the duplication is defined based on the first instance or the last instance in the group. The answer to that question determines how we determine which times are even within each group. In one instance, we want to use a packing intervals approach, and in the other, we want to use a gaps and islands approach. We can't tell you which approach to use based on the data that you have given us.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 9, 2017 at 8:39 pm
thanks everyone for your time on this post and i could manage in different way. I just took the first created record time and added needed intraval(newdate) and kept in another temp table. then wrote a query to compared the inserteddate <= newdate (from temp table). Am all set. thank you and appreciated.
January 9, 2017 at 10:04 pm
Quick thought, you can probably simplify the calculation, something like this maybe?
😎
INSERT INTO @Sample
SELECT '2017-01-07 07:28:01.917',1 UNION ALL
SELECT '2017-01-07 07:28:05.210',1 UNION ALL
SELECT '2017-01-07 14:06:21.840',1 UNION ALL
SELECT '2017-01-07 14:06:24.227',1 UNION ALL
SELECT '2017-01-07 07:28:03.023',1 UNION ALL
SELECT '2017-01-07 14:06:22.440',1 UNION ALL
SELECT '2017-01-07 07:28:07.477',1 UNION ALL
SELECT '2017-01-07 14:06:28.180',1 UNION ALL
SELECT '2017-01-07 16:06:22.440',2 UNION ALL
SELECT '2017-01-07 16:28:07.477',2 UNION ALL
SELECT '2017-01-07 01:06:28.180',2 UNION ALL
SELECT '2017-01-07 01:06:30.180',2;
SELECT
S.Id
,S.TestDate
,DENSE_RANK() OVER (PARTITION BY S.Id ORDER BY FLOOR(DATEDIFF(MINUTE,0,S.TestDate) / 30.0) ASC) AS DRNK
FROM @Sample S;
Output
Id TestDate DRNK
--- ----------------------- -----
1 2017-01-07 07:28:01.917 1
1 2017-01-07 07:28:05.210 1
1 2017-01-07 07:28:07.477 1
1 2017-01-07 07:28:03.023 1
1 2017-01-07 14:06:22.440 2
1 2017-01-07 14:06:28.180 2
1 2017-01-07 14:06:21.840 2
1 2017-01-07 14:06:24.227 2
2 2017-01-07 01:06:28.180 1
2 2017-01-07 01:06:30.180 1
2 2017-01-07 16:06:22.440 2
2 2017-01-07 16:28:07.477 2
January 10, 2017 at 6:29 pm
KGJ-Dev (1/9/2017)
Hi Luis,thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.
select * from @Sample order by TestDate
please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.
in my case below are the duplicates
2017-01-07 14:06:21.840
2017-01-07 14:06:22.440
2017-01-07 14:06:24.227
2017-01-07 14:06:28.180
how do i get this duplicate records based on same hour data. any sugestion please
Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well
SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)
FROM @Sample s
WHERE NOT EXISTS (SELECT *
FROM @Sample s2
WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)
AND s2.TestDate < s.TestDate
)
SELECT 'duplicates to remove', *
FROM @Sample s
WHERE EXISTS (SELECT *
FROM @Sample s2
WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)
AND s2.TestDate < s.TestDate
)
_____________
Code for TallyGenerator
January 18, 2017 at 6:59 pm
Sergiy - Tuesday, January 10, 2017 6:29 PMKGJ-Dev (1/9/2017)
Hi Luis, thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.select * from @Sample order by TestDateplease consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.in my case below are the duplicates2017-01-07 14:06:21.8402017-01-07 14:06:22.4402017-01-07 14:06:24.2272017-01-07 14:06:28.180how do i get this duplicate records based on same hour data. any sugestion please Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well
SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)FROM @Sample sWHERE NOT EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )SELECT 'duplicates to remove', * FROM @Sample sWHERE EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )
Thanks a lot guys for the different solutions. much appreciated.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply