June 18, 2010 at 12:57 pm
Hello,
I'm trying to come up with a way to count how many records exist on a hourly basis within a period of time.
I have a record with start time and stop time. I want to determine an hourly count of records that exist during the time span.
The end goal will look like this.
[hour of the day], [count of records]
0, 4
1, 10
2, 6
etc....
If record 6876 had a start time of "2010-06-01 06:13:54.000" and a stop time of "2010-06-01 16:00:26.000"...I want to account for each hour in this record.
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#whosonwhen','U') IS NOT NULL
DROP TABLE #whosonwhen
--===== Create the test table
CREATE TABLE #whosonwhen
(
[ID] INT,
[Start] DATETIME,
[Stop] DATETIME
)
--===== Insert the test data into the test table
INSERT INTO #whosonwhen ([ID], [Start], [Stop])
SELECT '6876','2010-06-01 06:13:54.000','2010-06-01 16:00:26.000' UNION ALL
SELECT '6919','2010-06-01 07:56:59.000','2010-06-01 17:57:00.000' UNION ALL
SELECT '6863','2010-06-01 08:37:00.000','2010-06-01 15:30:52.000' UNION ALL
SELECT '6851','2010-06-01 08:18:00.000','2010-06-01 15:00:26.000' UNION ALL
SELECT '6852','2010-06-01 09:02:17.000','2010-06-01 15:00:29.000' UNION ALL
SELECT '683','2010-06-01 20:02:17.000','2010-06-02 02:00:29.000'
Thanks!
June 18, 2010 at 1:50 pm
K big question time.
In you example ID 683 starts on the 1st but does not stop till the 2nd.
When will this query be run and how would this record be handled. to have an accurrate hourly report that is to say a report that places the count into a bucket by hour then you have to have a cut off point. Idealy this cutoff would be midnight otherwise you have to get creative with your math. basically everything after midnight would be an extra hour and then you end up with more than 24 hours in a day. that can get confusing so I think the query depends on this record. I have an idea for a query but it the cutoff is not midnight then my idea is not valid.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
June 18, 2010 at 2:01 pm
Nice problem.
Here is my first attempt:
declare @periodStart datetime
declare @periodEnd datetime
-- they must be inside of the same day!!!!
set @periodStart = '20100601'
set @periodEnd = '2010-06-10 23:59:59.000'
declare @givendate datetime
set @givendate = convert(varchar(12),@periodStart,112)
-- you may create this table by other ways...
select top 24
DATEADD(hour,ROW_NUMBER() OVER (ORDER BY Object_id)-1,@givendate) as HourOfDay
,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour
into #dh
from sys.columns
;with correctRange
as
(
select [ID]
,case when w.[Start] < @periodStart and w.[Stop] > @periodStart then @periodStart else w.[Start] end as [Start]
,case when w.[Stop] >= @periodEnd and w.[Start] < @periodEnd then @periodEnd else w.[Stop] end as [Stop]
from #whosonwhen w
)
select d.DayHour, count(*) as RecCount
from correctRange w
join #dh d on d.HourOfDay between w.[Start] and w.[Stop]
where w.[Start] >= @periodStart
group by d.DayHour
order by d.DayHour
drop table #dh
June 18, 2010 at 4:52 pm
Eugene Elutin (6/18/2010)
Nice problem.Here is my first attempt:
declare @periodStart datetime
declare @periodEnd datetime
-- they must be inside of the same day!!!!
set @periodStart = '20100601'
set @periodEnd = '2010-06-10 23:59:59.000'
declare @givendate datetime
set @givendate = convert(varchar(12),@periodStart,112)
-- you may create this table by other ways...
select top 24
DATEADD(hour,ROW_NUMBER() OVER (ORDER BY Object_id)-1,@givendate) as HourOfDay
,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour
into #dh
from sys.columns
;with correctRange
as
(
select [ID]
,case when w.[Start] < @periodStart and w.[Stop] > @periodStart then @periodStart else w.[Start] end as [Start]
,case when w.[Stop] >= @periodEnd and w.[Start] < @periodEnd then @periodEnd else w.[Stop] end as [Stop]
from #whosonwhen w
)
select d.DayHour, count(*) as RecCount
from correctRange w
join #dh d on d.HourOfDay between w.[Start] and w.[Stop]
where w.[Start] >= @periodStart
group by d.DayHour
order by d.DayHour
drop table #dh
Nicely done but I believe you're dropping the first hour. Also, be careful of the 23:59:59.000 thing. A lot of data can happen in the last second of the day not to mention the roundup to the next day at .998 should it occur in the data.
Here's one possible solution using a zero based Tally table which also returns the count of 0 for hours not included in the original data.
DECLARE @StartDay DATETIME,
@NextStartDay DATETIME
;
SELECT @StartDay = '20100601',
@NextStartDay = '20100602'
;
WITH
cteDateRanges AS
(
SELECT DATEADD(hh, t.N, @StartDay) AS PeriodStart,
DATEADD(hh, t.N + 1, @StartDay) AS NextPeriodStart
FROM dbo.Tally t
WHERE t.N < DATEDIFF(hh, @StartDay, @NextStartDay)
)
SELECT range.PeriodStart, COUNT(source.Start)
FROM #whosonwhen source
FULL JOIN cteDateRanges range
ON range.PeriodStart <= source.Stop
AND range.NextPeriodStart > source.Start
GROUP BY range.PeriodStart
ORDER BY range.PeriodStart
;
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2010 at 5:23 pm
Also be sure to truncate the @StartTime (of the reporting period) passed in down to the even hour, or you would get a totally incorrect result.
Scott Pletcher, SQL Server MVP 2008-2010
June 18, 2010 at 6:44 pm
Scott is correct. The code I posted has no checks for whole days nor even whole hours.
If your code is passing the start date as a legal datetime value and not a string, here's a fast way for the code to force the start date to the beginning of the day and to ensure that you only get one day back...
DECLARE @StartDay DATETIME,
@NextStartDay DATETIME
;
SELECT @StartDay = '20100601 13:17:19.123',
@StartDay = CAST(DATEDIFF(dd, 0, @StartDay) AS DATETIME),
@NextStartDay = DATEADD(dd,1,@StartDay)
;
SELECT @startday
;
Be careful, though.. if someone passes a string that looks like '23:59:59.999', it'll return the next day instead of the current day (at least in 2k5)...
DECLARE @StartDay DATETIME,
@NextStartDay DATETIME
;
SELECT @StartDay = '20100601 23:59:59.999',
@StartDay = CAST(DATEDIFF(dd, 0, @StartDay) AS DATETIME),
@NextStartDay = DATEADD(dd,1,@StartDay)
;
SELECT @startday
;
... and it does so even without the round down attempt.
DECLARE @StartDay DATETIME,
@NextStartDay DATETIME
;
SELECT @StartDay = '20100601 23:59:59.999'
;
SELECT @startday
;
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2010 at 3:46 am
Jeff Moden (6/18/2010)
Eugene Elutin (6/18/2010)
...
Nicely done but I believe you're dropping the first hour. Also, be careful of the 23:59:59.000 thing. A lot of data can happen in the last second of the day not to mention the roundup to the next day at .998 should it occur in the data.
...
Jeff, you are right about missing first hour. Of cause, for a proper check the time range is required. The records for the last second in day should be fine as validated against the next date.
Your query does proper check and returns the count of records for each hour between given dates in StartDate and NextStartDate, however if you add time part (eg. from 10:00:00.000 to 18:00:00.000) into input, then it does not behave properly - it returns raw for PeriodStart = NULL.
So, another version is required (I will use "Tally" cte created on fly):
declare @periodStart datetime
declare @periodEnd datetime
-- they must be inside of the same day, otherwise counting hours from 0 to 23 is not appropriate!!!!
set @periodStart = '2010-06-01 03:15:00.000'
set @periodEnd = '2010-06-01 21:20:00.000'
-- Given Period is taken as inclusive of given hours in the input (eg. 15:25:30 will be taken as 15:00:00)
-- frist make sure that the minutes, seconds and milliseconds are removed from input range for clarity
set @periodStart = dateadd(hh, datepart(hh,@periodStart), convert(varchar(12),@periodStart,112))
set @periodEnd = dateadd(hh, datepart(hh,@periodEnd), convert(varchar(12),@periodEnd,112))
-- you may create this CTE by other ways (eg. from permanent Tally table)...
;with dh
as
(
select top 24
DATEADD(hour,ROW_NUMBER() OVER (ORDER BY [Object_id])-1,convert(varchar(12),@periodStart,112)) as HoDstart
,DATEADD(hour,ROW_NUMBER() OVER (ORDER BY [Object_id]),convert(varchar(12),@periodStart,112)) as HoDend
,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour
from sys.columns -- or any other (not very big) table which have more than 24 raws, just remamber to change
-- [Object_id] in OVER (ORDER BY [Object_id]... to some existing column
)
select d.DayHour, count(w.ID) as RecCount
from dh d
left join #whosonwhen w
on w.[Start] < d.HoDend
and w.[Stop] >= d.HoDstart
where d.HoDstart between @periodStart and @periodEnd
group by d.DayHour
order by d.DayHour
June 19, 2010 at 8:44 am
Eugene Elutin (6/19/2010)
Your query does proper check and returns the count of records for each hour between given dates in StartDate and NextStartDate, however if you add time part (eg. from 10:00:00.000 to 18:00:00.000) into input, then it does not behave properly - it returns raw for PeriodStart = NULL.
Thanks, Eugene... I'll take a look at both.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2010 at 10:02 am
Ya know... after I got done tweaking my code, except for the way we gen the dates, the code came out nearly identical as your latest tweek. I'll also add that your original idea of creating a small temp table instead of using the CTE makes the code a whole lot faster in the face of any scalability. The optimizer thinks it needs to regen the 24 rows for each row in the source table on the CTE code instead of genning the 24 rows just once.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2010 at 10:40 am
Jeff, in a final code I would use temp table (like in the first sample). But in this case I did it with CTE to save few lines of code :-D. I guess, this would also give opportunity to the requestor to learn a bit more about optimisation in case of inadequate performance*.
*Of cause, the last bit, is just a nice excuse which came up in my mind, after reading your valid comment 😛
I don't know why I'm still here - it's a Saturday at the end... 😀
June 21, 2010 at 7:50 am
It's interesting. I thought the requestor was interested in time-of-day across any number of days, rather than just on one 24-hr period.
For example, for the last, say, 2 weeks, how much activity was there at a certain hour -- midnight? 1AM? 2AM? 3AM? -- without regard for which day it was.
I was working on code for that but dropped it once the other approach seemed to be generally accepted by everyone else.
Scott Pletcher, SQL Server MVP 2008-2010
June 21, 2010 at 10:49 am
Thanks for the replies! You've given me some great ideas. I'll do some testing on larger data sets and see how it goes.
Thanks
Dave
June 21, 2010 at 10:52 am
Eugene Elutin (6/19/2010)
Jeff, in a final code I would use temp table (like in the first sample). But in this case I did it with CTE to save few lines of code :-D. I guess, this would also give opportunity to the requestor to learn a bit more about optimisation in case of inadequate performance*.*Of cause, the last bit, is just a nice excuse which came up in my mind, after reading your valid comment 😛
I don't know why I'm still here - it's a Saturday at the end... 😀
From the original description, I don't actually know what the real end result the OP wants is. Post your code... it can't hurt.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 22, 2010 at 3:19 pm
OK... I tried out Eugenes code on a much bigger data set. It appears to be aggregating hourly.
08993
18971
28954
38943
48932
58916
68879
78786
88372
96975
104606
112251
12860
13447
14289
15214
16159
17131
18108
1985
2064
2146
2227
2311
What I'm trying to do is get a total hourly count, not a running total. It would look something like this.
012
187
2100
3256
4980
5589
6278
7109
878
956
1050
11267
12345
13106
1434
1523
1616
176
1857
1965
2014
219
221
23123
Thanks !
June 22, 2010 at 3:55 pm
daveb-840383 (6/22/2010)
OK... I tried out Eugenes code on a much bigger data set. It appears to be aggregating hourly.What I'm trying to do is get a total hourly count, not a running total. It would look something like this.
Thanks !
Are you saying that you want an hourly count by day? And what do you want to show if you have a time span of more than 1 day?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply