April 25, 2013 at 11:47 am
Racking my brain on this one.
I have the following table
AccountID DataDate DataValue
1 1/1/2012 100
1 2/5/2012 190
1 3/18/2012 118
1 6/5/2012 490
1 6/29/2012 89
2 1/8/2012 120
2 3/5/2012 218
etc
The datadata is the end of a period -- so the datavalue for a record refers to a value that has been accumulated -- not a forward projection of value owed, if that makes sense.
I need to be able to figure out and generate a by-month average of the datavalue.
So, for instance, for the second row: 190 units, spread across a date range from 1/1/2012 to 2/5/2012 would be 36 days -- 5 of them in February. This I can do -- when the gap between reads is less than 2 months, it's easy. # of days in first month divided by total number of days in date range * value, then # of days in second month divided by total number of days in date range * value.
What I need is to be able to do, though, is handle the instances where there's a multi-month gap.
How can I take rows 4 and 5, and return a prorated average across march, may, and june?
Looking at my dataset (55 million rows from 1/1/2011 to current), I've got 2000+ instances where the data gap is 13 months. So it would have to be able to handle multi-year scenarios as well.
Anyone know a good math formula to calculate this? It's like statistical analysis or something, but I'm just a DBA, not a mathemagician.
How can I solve this easily, without a really ugly, ugly cursor and other scary methods?
April 25, 2013 at 1:41 pm
Can you firstly provide us with the set up of your queries, some test data like the below and anything you're already done?
SELECT 1 [DataId], 1 [AccountId], '2012-01-01' [DataDate], 100 [DataValue] UNION ALL
SELECT 2,1,'2012-05-02',190 UNION ALL
SELECT 3,1,'2012-03-18',118 UNION ALL
SELECT 4,1,'2012-06-05',490 UNION ALL
SELECT 5,1,'2012-06-29',89 UNION ALL
SELECT 6,2,'2012-01-08',120 UNION ALL
SELECT 7,2,'2012-03-05'
Secondly, you're talking about rows 4 and 5 like they are in different months? You've told us that each day is the end of a period, but what are the periods? Because some of the days are Sundays and some are Fridays.
Jim..
April 25, 2013 at 2:04 pm
you would need another table for a join that contained the first date of every month.
I don't think i have your answer, (its a bit tough to understand what you need) but here is a sample of code that might get your creative juices flowing;
declare @table table (AccountID int, DataDate datetime, DataValue int)
insert into @table VALUES (1, '01/01/2012',100),
(1, '02/05/2012', 190),
(1, '03/18/2012', 118),
(1, '06/5/2012', 490),
(1, '06/29/2012', 89),
(2, '01/8/2012', 120),
(2, '03/5/2012', 218)
declare @CalendarMonths table ([date] DATETIME)
DECLARE
@basedate DATETIME,
@offset INT
SELECT
@basedate = '01/01/2000',
@offset = 1
WHILE (@offset < 200)
BEGIN
INSERT INTO @CalendarMonths
SELECT DATEADD(MONTH, @offset, @basedate)
set @offset = @offset + 1
END
select * from @table
select T1.AccountID, T1.DataDate, datediff(day, T2.date, T1.DataDate) as [Diff], T1.DataValue
from @table T1
join @CalendarMonths T2 on
datepart(month,T1.DataDate) = datepart(month,T2.date)
and datepart(year,T1.DataDate) = datepart(year,T2.date)
order by T1.AccountID, T1.DataDate
April 25, 2013 at 3:06 pm
Thanks Jimbo.
Here's the basic table: it's for calculating a meter's usage over time.
meterid (varchar(12)) read date (datetime) usage (smallint)
472641501 2011-02-15 00:00:00.00077
472641501 2011-03-18 00:00:00.00079
472641501 2012-02-15 00:00:00.000190
472641501 2012-04-18 00:00:00.000120
472641501 2012-05-17 00:00:00.00020
472641501 2012-06-19 00:00:00.0000
472641501 2013-02-14 00:00:00.000164
472641501 2013-03-15 00:00:00.000160
I'm tasked with delivering a result set that is something like this:
meterid usagemonth usageyear avgusagepermonth
472641501 2 2011 78
472641501 3 2011 ??
472641501 4 2011 ??
474641501 5....
etc.
end of a period indicates end of read cycle. usually every 4 weeks, so there can be 2 inside the same month, but other times, for reasons I don't understand, it won't be read for months at a time.
April 25, 2013 at 3:07 pm
Thanks, Geoff. That's what I'm exploring now -- setting up a months table to join against. I had wanted to avoid that, but I'm not certain that I had a good reason why.
April 25, 2013 at 3:13 pm
Using a self-join, you can establish what a daily average rate for each entry is. Once you've gotten that, each entry will have a start and an end date.
From there, you can fold in an auxialliary table of values (a.k.a. a Nums table or tally table - just a list of numbers - Mine's called dbo.Nums, which has only one integer column, [n]) so there will be a row for each day that an original entry represents.
Once you have that, just roll it up by AcctID, Year, and Month. In the code below, a excluded the start date from the number of days used to calculate the average (only count any day as being part of one set of data). The below query doesn't need as many nested queries as shown, but I wrote it that way to show the steps.
-- prep sample data
CREATE TABLE #Ranges(AcctID int NOT NULL, DataDate date NOT NULL,
DataValue int NOT NULL,
PRIMARY KEY(AcctID, DataDate));
INSERT #Ranges(AcctID, DataDate, DataValue)
VALUES (1, '1/1/2012', 100),
(1, '2/5/2012', 190),
(1, '3/18/2012', 118),
(1, '6/5/2012', 490),
(1, '6/29/2012', 89),
(2, '1/8/2012', 120),
(2, '3/5/2012', 218);
GO
-- Actual query. See notes from inner to outer queries.
-- Could be collapsed into fewer queries,
-- but I split it out to show the steps
WITH Ranges AS
(SELECT ROW_NUMBER() OVER (PARTITION BY AcctID ORDER BY DataDate) AS rn,
AcctID, DataDate, DataValue
FROM #Ranges)
-- 3. Roll it all up by month and year
SELECT AcctID, [DataYear], [DataMonth], sum(DailyValue) AS [MonthTotal]
FROM (
-- 2. Add in the aux table of values to get a separate row for each
-- specific day with the value for that day
SELECT c.*, dateadd(d, 0, n) AS CalcDate,
year(dateadd(d, 0, n)) AS [DataYear],
month(dateadd(d, 0, n)) AS [DataMonth]
FROM (
-- 1. This query establishes the start and end date of each
-- range, along with the daily average
SELECT a.AcctID, a.DataDate AS [StartDate], b.DataDate AS [EndDate],
datediff(d, a.DataDate, b.DataDate) AS [DateRange],
b.DataValue,
b.DataValue/(1.0 * datediff(d, a.DataDate, b.DataDate)) [DailyValue],
datediff(d, 0, a.DataDate) AS StartDayNum,
datediff(d, 0, b.DataDate) AS [EndDayNum]
FROM Ranges a INNER JOIN
Ranges b ON a.AcctID = b.AcctID AND a.rn = b.rn - 1
) c INNER JOIN
dbo.Nums n ON n BETWEEN c.StartDayNum + 1 AND c.EndDayNum
) d
GROUP BY AcctID, DataYear, DataMonth
ORDER BY AcctID, DataYear, DataMonth
Eddie Wuerch
MCM: SQL
April 26, 2013 at 3:36 pm
Eddie, this looks to be exactly what I'm looking for -- when I run your query as written, though, I get no results returned. Are you missing a step in there somewhere? Am I?
April 26, 2013 at 3:50 pm
Never mind. Noob mistake. My tally table was too small.
April 29, 2013 at 5:00 am
-- Expanding the date range out by months rather than days might be 30x cheaper
;WITH OrderedData AS (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY DataDate DESC)
FROM #Sample
)
SELECT
a.AccountID,
[Year]= YEAR(x.SliceStart),
[Month]= MONTH(x.SliceStart),
MonthTotal = SUM((z.DaysCovered*1.0/z.TotalDaysCovered) * b.DataValue)
FROM OrderedData a
INNER JOIN OrderedData b ON b.AccountID = a.AccountID AND b.rn = a.rn-1
CROSS APPLY ( -- row generator will cope with a maximum gap of 12 months, add rows for bigger gaps
SELECT TOP(1+DATEDIFF(MONTH, a.DataDate, b.DataDate)) n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) d (n)
ORDER BY n
) t
CROSS APPLY (
SELECT SliceStart = CASE WHEN t.n = 0 THEN DATEADD(DAY,1,a.DataDate)
ELSE DATEADD(MONTH,t.n+DATEDIFF(MONTH,0,a.DataDate),0) END
) x
CROSS APPLY (
SELECT SliceEnd = CASE WHEN DATEDIFF(MONTH, SliceStart, b.DataDate) > 0
THEN DATEADD(DAY,-1,DATEADD(MONTH,t.n+1+DATEDIFF(MONTH,0,a.DataDate),0))
ELSE b.DataDate END
) y
CROSS APPLY (
SELECT DaysCovered = 1+DATEDIFF(DAY,SliceStart,SliceEnd),
TotalDaysCovered = DATEDIFF(DAY,a.DataDate,b.DataDate)
) z
GROUP BY a.AccountID, YEAR(x.SliceStart), MONTH(x.SliceStart)
ORDER BY a.AccountID, YEAR(x.SliceStart), MONTH(x.SliceStart)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply