Date logic

Question

Post reply

Date logic

KGJ-Dev

SSCertifiable

Points: 5780
More actions
January 9, 2017 at 10:47 am

#316811

Hi,
Below is the sample data to play with and i need to use the partition by to split the record based on hour difference.
Declare @Sample table(TestDate datetime, Id int);
insert into @Sample
SELECT '2017-01-07 07:28:01.917' , 1
UNION ALL
SELECT '2017-01-07 07:28:05.210' , 1
UNION ALL
SELECT '2017-01-07 14:06:21.840' , 1
UNION ALL
SELECT '2017-01-07 14:06:24.227' , 1
UNION ALL
SELECT '2017-01-07 07:28:03.023' , 1
UNION ALL
SELECT '2017-01-07 14:06:22.440' , 1
UNION ALL
SELECT '2017-01-07 07:28:07.477' , 1
UNION ALL
SELECT '2017-01-07 14:06:28.180' , 1'
TestDate RN
2017-01-07 07:28:01.917 1
2017-01-07 07:28:05.210 1
2017-01-07 14:06:21.840 2
2017-01-07 14:06:24.227 2
2017-01-07 07:28:03.023 1
2017-01-07 14:06:22.440 2
2017-01-07 07:28:07.477 1
2017-01-07 14:06:28.180 2
Any sample query to achieve this please

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

Maybe something like this:

SELECT *,

DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

DENSE_RANK() OVER(ORDER BY DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN

FROM @Sample;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

KGJ-Dev SSCertifiable Points: 5780 More actions · Answer 2

Hi Luis,

thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.

select * from @Sample order by TestDate

please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.

in my case below are the duplicates

2017-01-07 14:06:21.840

2017-01-07 14:06:22.440

2017-01-07 14:06:24.227

2017-01-07 14:06:28.180

how do i get this duplicate records based on same hour data. any sugestion please

Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

What´s the problem with the code I posted?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

KGJ-Dev SSCertifiable Points: 5780 More actions · Answer 4

as like my sample i have 100*1000 records in my table and dense rank puts incremental number. basically i need the query to delete the duplicate records. any help please

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Lynn Pettis

SSC Guru

Points: 442467

January 9, 2017 at 1:57 pm

#1921494

I'm lost on your requirements.

KGJ-Dev SSCertifiable Points: 5780 More actions · Answer 6

Sure, i will explain again.

I have a table and right now i have duplicates on the table. Only way i can clean the duplicates based on id, RecordInsertedDateTime.

The most of the records gets created based with in 30 minutes. so i need to find the duplicate records based on created time intraval.

Sample code:

DECLARE @Sample TABLE (

TestDate DATETIME

,Id INT

);

INSERT INTO @Sample

SELECT '2017-01-07 07:28:01.917'

,1

UNION ALL

SELECT '2017-01-07 07:28:05.210'

,1

UNION ALL

SELECT '2017-01-07 14:06:21.840'

,1

UNION ALL

SELECT '2017-01-07 14:06:24.227'

,1

UNION ALL

SELECT '2017-01-07 07:28:03.023'

,1

UNION ALL

SELECT '2017-01-07 14:06:22.440'

,1

UNION ALL

SELECT '2017-01-07 07:28:07.477'

,1

UNION ALL

SELECT '2017-01-07 14:06:28.180'

,1

UNION ALL

SELECT '2017-01-07 16:06:22.440'

,2

UNION ALL

SELECT '2017-01-07 16:28:07.477'

,2

UNION ALL

SELECT '2017-01-07 01:06:28.180'

,2

union all

SELECT '2017-01-07 01:06:30.180', 2

on my above sample the below records created with in few seconds. but i don't want to go with seconds, so i kept 1 hours as criteria. So the first records inserted on 07:28:01.917. So (07:28:01.917 + 1 hour ) is my criteria

For Id = 1

2017-01-07 07:28:01.917

2017-01-07 07:28:05.210

2017-01-07 07:28:07.477

2017-01-07 07:28:03.023

Duplicate Records:

2017-01-07 14:06:22.440

2017-01-07 14:06:28.180

2017-01-07 14:06:21.840

2017-01-07 14:06:24.227

But if you see the above duplicated records got inserted from 14:06:22.440 for the same Id 1

so need to delete these records.

SELECT *,

DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

DENSE_RANK() OVER(ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN

FROM @Sample;

if i use Luis solution the RN has to be 1 for non duplicate record and 2 for duplicate records for the same Id but i am getting incremented RN and am struggling on removing the duplicates.

Is my requirement clear? Any help please

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

How and why are they duplicates? All I see is a DateTime value and an ID. There is no real logic to state why they are duplicated just because they are in a different hour of the same day. Would they be duplicates if the date were different?

This looks like a case of missing information unless I am missing something.

ZZartin SSC-Dedicated Points: 30920 More actions · Answer 8

Is this more what you're looking for? This will keep the first set of records within the earliest hour for each Id on one day.

WITH TEMP_CTE AS(

SELECT *,

DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

DENSE_RANK() OVER(PARTITION BY id, DATEADD(day, DATEDIFF(day, '2017', TestDate), '2017') ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') ASC) RN

FROM @Sample

)

DELETE FROM TEMP_CTE WHERE RN > 1

drew.allen SSC Guru Points: 76985 More actions · Answer 9

The problem is that the definition of "duplicate" here is not well defined. Specifically, you want to count times as duplicates if they fall within a specific threshold of time. For convenience sake, let's say an hour is the threshold.

Say we having the following events and times

event_id time

1 12:00

2 12:45

3 13:30

Clearly 1 and 2 fall within the threshold, so they should be duplicates, and 2 and 3 fall within the threshold, so they should be duplicates, but 1 and 3 fall outside of the threshold, so they should not be duplicates, but they are both duplicates of 2, so they should be duplicates.

So the question is whether the duplication is defined based on the first instance or the last instance in the group. The answer to that question determines how we determine which times are even within each group. In one instance, we want to use a packing intervals approach, and in the other, we want to use a gaps and islands approach. We can't tell you which approach to use based on the data that you have given us.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

KGJ-Dev SSCertifiable Points: 5780 More actions · Answer 10

thanks everyone for your time on this post and i could manage in different way. I just took the first created record time and added needed intraval(newdate) and kept in another temp table. then wrote a query to compared the inserteddate <= newdate (from temp table). Am all set. thank you and appreciated.

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 11

Quick thought, you can probably simplify the calculation, something like this maybe?

😎

INSERT INTO @Sample

SELECT '2017-01-07 07:28:01.917',1 UNION ALL

SELECT '2017-01-07 07:28:05.210',1 UNION ALL

SELECT '2017-01-07 14:06:21.840',1 UNION ALL

SELECT '2017-01-07 14:06:24.227',1 UNION ALL

SELECT '2017-01-07 07:28:03.023',1 UNION ALL

SELECT '2017-01-07 14:06:22.440',1 UNION ALL

SELECT '2017-01-07 07:28:07.477',1 UNION ALL

SELECT '2017-01-07 14:06:28.180',1 UNION ALL

SELECT '2017-01-07 16:06:22.440',2 UNION ALL

SELECT '2017-01-07 16:28:07.477',2 UNION ALL

SELECT '2017-01-07 01:06:28.180',2 UNION ALL

SELECT '2017-01-07 01:06:30.180',2;

SELECT

S.Id

,S.TestDate

,DENSE_RANK() OVER (PARTITION BY S.Id ORDER BY FLOOR(DATEDIFF(MINUTE,0,S.TestDate) / 30.0) ASC) AS DRNK

FROM @Sample S;

Output

Id TestDate DRNK

--- ----------------------- -----

1 2017-01-07 07:28:01.917 1

1 2017-01-07 07:28:05.210 1

1 2017-01-07 07:28:07.477 1

1 2017-01-07 07:28:03.023 1

1 2017-01-07 14:06:22.440 2

1 2017-01-07 14:06:28.180 2

1 2017-01-07 14:06:21.840 2

1 2017-01-07 14:06:24.227 2

2 2017-01-07 01:06:28.180 1

2 2017-01-07 01:06:30.180 1

2 2017-01-07 16:06:22.440 2

2 2017-01-07 16:28:07.477 2

Sergiy SSC Guru Points: 110208 More actions · Answer 12

KGJ-Dev (1/9/2017)
Hi Luis,
thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.
select * from @Sample order by TestDate
please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.
in my case below are the duplicates
2017-01-07 14:06:21.840
2017-01-07 14:06:22.440
2017-01-07 14:06:24.227
2017-01-07 14:06:28.180
how do i get this duplicate records based on same hour data. any sugestion please
Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well

SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)

FROM @Sample s

WHERE NOT EXISTS (SELECT *

FROM @Sample s2

WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)

AND s2.TestDate < s.TestDate

)

SELECT 'duplicates to remove', *

FROM @Sample s

WHERE EXISTS (SELECT *

FROM @Sample s2

WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)

AND s2.TestDate < s.TestDate

)

_____________
Code for TallyGenerator

KGJ-Dev SSCertifiable Points: 5780 More actions · Answer 13

Sergiy - Tuesday, January 10, 2017 6:29 PM
KGJ-Dev (1/9/2017)
Hi Luis, thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.select * from @Sample order by TestDateplease consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.in my case below are the duplicates2017-01-07 14:06:21.8402017-01-07 14:06:22.4402017-01-07 14:06:24.2272017-01-07 14:06:28.180how do i get this duplicate records based on same hour data. any sugestion please Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well
SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)FROM @Sample sWHERE NOT EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )SELECT 'duplicates to remove', * FROM @Sample sWHERE EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )

Thanks a lot guys for the different solutions. much appreciated.