Conditional Aggregation

Question

Conditional Aggregation

sridhar_kola

SSC-Addicted

Points: 432
More actions
May 29, 2012 at 1:01 am

#262123

Lets say I have a table as
MIDTime (Hh:Mm)
M110:22
M112:15
M113:22
M116:00
M117:50
I need to aggregate the counts for MID(M1) based on the condition "to aggregate values that are beyond 2 hrs window"
So in above case , the valid ones are row1, row 3, row 4 .
So total aggregated value for M1 =3
I know I can write a cursor/loop to do this , but considering that I have a huge volume of data (~20M), is there a better way to solve this ?
Thanks

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

SGT_squeequal SSCertifiable Points: 7167 More actions · Answer 1

hi im not 100 % sure on what your asking and what you expect in results, looking at your example table there is nothing to filter down on you can group by the mid or time but from your explanation im not sure how you come to rows 1 3 etc are relevant any way if you wanted to calculate the time difference in hours by each mid then you can use something as simple as

select MID, datediff(hh,min(ttm),max (ttm))

from table_name

group by mid

***The first step is always the hardest *******

vinu512 SSCoach Points: 15729 More actions · Answer 2

sridhar_kola (5/29/2012)
Lets say I have a table as
MIDTime (Hh:Mm)
M110:22
M112:15
M113:22
M116:00
M117:50
I need to aggregate the counts for MID(M1) based on the condition "to aggregate values that are beyond 2 hrs window"
So in above case , the valid ones are row1, row 3, row 4 .
So total aggregated value for M1 =3
I know I can write a cursor/loop to do this , but considering that I have a huge volume of data (~20M), is there a better way to solve this ?
Thanks

What do you mean by "to aggregate values that are beyond 2 hrs window"??

I am assuming that you want to select successive rows with time difference >= 2(in hours).

Going by this assumption....only Row 4 satisfies the requirement.

How do Row1 and Row3 satisfy this requirement??....I'm might be wrong in assuming what your requirement is.

Please explain the business logic a little more.

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

SGT_squeequal SSCertifiable Points: 7167 More actions · Answer 3

@sridhar_kola i must say im in agreement about how to work out where the 2 hours comes in:w00t:

***The first step is always the hardest *******

sridhar_kola SSC-Addicted Points: 432 More actions · Answer 4

The logic seems to be to always select the first row (10:22 here), skip the second row because 12:15 is within 2 hours of 10:22, select 13:22 because it is over 2 hours from 10:22, select 16:00 because it is over two hours from 13:22, and skip 17:50 because it is within two hours of 16:00.

SGT_squeequal SSCertifiable Points: 7167 More actions · Answer 5

Can you paste in your logic and we may be in a better position to help you However,

have a look at this CTE from your explanation of the table this may help

;with CTE

as

(

select mid,row_number() over(partition by MID order by ttm) as id,ttm

from t1)

select * from cte a inner join cte b on a.id=b.id+1

where datediff(hh,a.ttm,b.ttm)>=2

***The first step is always the hardest *******

Paul White SSC Guru Points: 150468 More actions · Answer 6

This is the first solution that springs to mind. Better ones are surely possible, I'll have another think about it shortly.

DECLARE @Example AS TABLE

(

MID character(2) NOT NULL,

Time time NOT NULL,

PRIMARY KEY (MID, Time)

);

INSERT @Example

(MID, Time)

VALUES

('M1', '10:22'),

('M1', '12:15'),

('M1', '13:22'),

('M1', '16:00'),

('M1', '17:50'),

('M2', '09:00'),

('M2', '10:59'),

('M2', '13:00'),

('M2', '15:30'),

('M2', '17:30');

WITH rCTE AS

(

-- First record per group

SELECT

e.MID,

MIN(e.Time) AS Time

FROM @Example AS e

GROUP BY

e.MID

UNION ALL

-- Recursive search

SELECT

x.MID,

x.Time

FROM

(

-- Find rows at least 2 hours away from the current one

-- And number (cannot use TOP in recursive CTE)

SELECT

e.MID,

e.Time,

rn = ROW_NUMBER() OVER (

ORDER BY e.Time)

FROM @Example AS e

JOIN rCTE ON

rCTE.MID = e.MID

AND rCTE.Time <= e.Time

AND rCTE.Time <= DATEADD(HOUR, -2, e.Time)

) AS x

WHERE

-- First row (translates to a TOP)

x.rn = 1

)

SELECT

rCTE.MID,

row_count = COUNT_BIG(*)

FROM rCTE

GROUP BY

rCTE.MID;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

sridhar_kola SSC-Addicted Points: 432 More actions · Answer 7

I think you have not understood the requirement. Your CTE will give the difference between previous and current time. But the requirement is to get the time difference from the last successful record.

Please see my example above

SGT_squeequal SSCertifiable Points: 7167 More actions · Answer 8

i think you need to define what is a successful record as i understand what you are asking for is what we have all completed and that is to check the time between the last record and the next.

***The first step is always the hardest *******

Paul White SSC Guru Points: 150468 More actions · Answer 9

sridhar_kola (5/29/2012)
I think you have not understood the requirement. Your CTE will give the difference between previous and current time. But the requirement is to get the time difference from the last successful record.
Please see my example above

That's what it does - it's a recursive CTE. Please run the code - it returns 3 for M1 and 4 for M2. I hope I did understand the requirement, because it's my explanation (from the original thread) that you posted here 🙂

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

vinu512 SSCoach Points: 15729 More actions · Answer 10

sridhar_kola (5/29/2012)
I think you have not understood the requirement. Your CTE will give the difference between previous and current time. But the requirement is to get the time difference from the last successful record.
Please see my example above

No, the OP has made it clear enough.

The requirement here is to:

1. Select the first record.(Last Successful Record)

2. Check the next record with the Last Successful Value.

3. Select the Second record only if it is 2 or more hours from the Last Successful Record Else Move to

the Next Record and repeat step 2.

Edit: Chris's method works fine. I just tested it.

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

sridhar_kola SSC-Addicted Points: 432 More actions · Answer 11

sridhar_kola

SSC-Addicted

Points: 432

May 29, 2012 at 4:58 am

#1494093

Cheers Paul ! This is what I expected !

sridhar_kola SSC-Addicted Points: 432 More actions · Answer 12

sridhar_kola

SSC-Addicted

Points: 432

May 29, 2012 at 4:59 am

#1494094

Cheers Paul !! This is what I expected !