October 31, 2011 at 12:07 pm
I'm having trouble with this one, and haven't found ideas on the web. Given a day's worth of nearly continuous (1 second) flow rate data, how would you calculate the maximum flow for a given length of time, say an hour. Conceptually you would go 1 second at a time and get the avg flow rate for the next hour, then simply take the highest avg and convert it to flow. Any ideas on set-based SQL to accomplish this?
October 31, 2011 at 12:44 pm
Can you give a little more detail as to the process you'd like the SQL to perform?
In my head, I'm thinking you'd just grab MAX(flowrate) group by (Hour Dividers), but that seems to easy for what you're looking for so I'm probably just not awake yet.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
October 31, 2011 at 12:54 pm
Group By Hour would work if I were after the maximum avg flowrate for each hour, with hour being between 1:00 and 2:00, 2:00 and 3:00, etc. But my 'hour' is any hour of time - it could be from 1:00:00 to 2:00;00, 1:00:01 to 2:00:01, 1:00:02 to 2:00:02, etc.
October 31, 2011 at 1:00 pm
have you any sample data and DDL scripts...along with expected results?
it would help us to help you in providing some tested answers.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
October 31, 2011 at 1:05 pm
Brian Kier (10/31/2011)
Group By Hour would work if I were after the maximum avg flowrate for each hour, with hour being between 1:00 and 2:00, 2:00 and 3:00, etc. But my 'hour' is any hour of time - it could be from 1:00:00 to 2:00;00, 1:00:01 to 2:00:01, 1:00:02 to 2:00:02, etc.
Not as difficult as you believe, actually, though it's a bit of voodoo if you've never seen it before. If all you're looking for is to set your 'starting moment' via parameter and have it count off hours from there, not a problem. I just want to make sure I understand the requirements, since there's no sample data/result example.
So, to confirm, the requirement is a maximumFlowPerHour with a variable/parameterized starting point?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
October 31, 2011 at 1:15 pm
that doesn't sound quite right, though I'm not sure. There doesn't really need to be any parameters. If I assume a table is filled with 86,400 records (1 record per second for 1 day) - each record is time stamped hh:mm:ss (ignore day). I need the hour of time where the highest average. Recall that when I say "hour of time" I mean any 60 minute period. So the answer might be "The hour between 2:02:14 and 2:03:14 had the highest avg with a value of 6.5 cfs".
October 31, 2011 at 1:18 pm
Brian Kier (10/31/2011)
that doesn't sound quite right, though I'm not sure. There doesn't really need to be any parameters. If I assume a table is filled with 86,400 records (1 record per second for 1 day) - each record is time stamped hh:mm:ss (ignore day). I need the hour of time where the highest average. Recall that when I say "hour of time" I mean any 60 minute period. So the answer might be "The hour between 2:02:14 and 2:03:14 had the highest avg with a value of 6.5 cfs".
AH! You need to review a rolling window. Is it safe to limit this to a single day for performance sanity?
Edit: Psssst, that's a minute, not an hour, you listed above, btw. 😉
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
October 31, 2011 at 1:21 pm
yes, Rolling Window describes it well - never heard of that. Yes, we can limit it to a day, and yes, I should have put 2:02:14 to 3:02:14.
October 31, 2011 at 2:23 pm
The execution plan for this is some serious ugly. I'm getting ~ 1 minute runtimes on my PC, but I want to make sure the logic is what you're looking for before I go trying to nail this down and bringing anything significant to this that'll obfuscate the purpose of the process.
There are a few small comments in the code. Please review and let me know if you need assistance with any of this.
Sample Data build (million row test)
IF OBJECT_ID('tempdb..#tester') IS NOT NULL
DROP TABLE #tester
CREATE TABLE #tester
( TimeOfEntry DATETIME NOT NULL,
FlowRateBIGINT NOT NULL
)
CREATE CLUSTERED INDEX idx_c_tester ON #tester
(TimeOfEntry)
-- Seconds in a day: 60* 60 * 24 = 86400
-- My tally table is a million rows, which will give us ~ 11 days of data.
-- TRUNCATE TABLE #tester
INSERT INTO #tester
SELECT
DATEADD( ss, n, '20110101') AS TimeOfEntry,
ABS( CHECKSUM( NEWID())) AS FlowRate
FROM
tempdb..Tally AS t
--select top 100 * from #tester
--select min(TimeOfEntry), Max(timeofEntry) FROM #tester
--select min(FlowRate), Max(FlowRate) FROM #tester
Query Logic:
DECLARE @DateOfInterest DATETIME
SET @DateOfInterest = '20110103'
;WITH cte AS
(SELECT
distinctList.SecondsOnly,
ca.HourlyAverageFlowRate
FROM
(SELECT DISTINCT
-- Used to remove any millisecond inclusions
-- @DateOfInterest is used because seconds since 1/1/1900 causes overflows.
DATEADD( ss, DATEDIFF( ss, @DateOfInterest, TimeOfEntry) , @DateOfInterest) AS SecondsOnly
FROM
#tester
WHERE
TimeOfEntry >= @DateOfInterest
AND TimeOfEntry < DATEADD( dd, 1, @DateOfInterest)
) AS distinctList
CROSS APPLY
( SELECT
AVG( FlowRate) AS HourlyAverageFlowRate
FROM
#tester
WHERE
TimeOfEntry >= distinctList.SecondsOnly
AND TimeOfEntry < dateadd(hh, 1, SecondsOnly)
) AS ca
)
-- To see the results from the CTE
-- SELECT * FROM cte ORDER BY SecondsOnly
SELECT
SecondsOnly AS BeginningTime,
dateadd( hh, 1, SecondsOnly) AS HourEndsAt,
MaxRate AS HighestRate
FROM
(SELECT
MAX( HourlyAverageFlowRate) AS MaxRate
FROM
cte
) AS a
JOIN
cte
ONa.MaxRate = cte.HourlyAverageFlowRate
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
October 31, 2011 at 9:07 pm
Brian Ki (10/31/2011)
that doesn't sound quite right, though I'm not sure. There doesn't really need to be any parameters. If I assume a table is filled with 86,400 records (1 record per second for 1 day) - each record is time stamped hh:mm:ss (ignore day). I need the hour of time where the highest average. Recall that when I say "hour of time" I mean any 60 minute period. So the answer might be "The hour between 2:02:14 and 2:03:14 had the highest avg with a value of 6.5 cfs".
Post the CREATE TABLE statement for the table, 10 rows of data and tell me what the expected min/max flowrates are (I need limits for the test data I'm going to build) as well as the total number of rows you currently have in your table 'cuz I have a lightning fast idea on how to do this.;-)
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2011 at 3:10 am
Heh, I wonder if we're thinking along the same lines Jeff. I'm currently puzzling out a rolling window mechanic that should only need one (maybe two) passes on the source data via mathmatics based on packaging the rows into multiple simultaneous containers and a CASE'd AVG statement.
Just can't seem to nail it down in my head though.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 1, 2011 at 3:29 am
Here's a two-pass solution.
Add a new column to your source data - HourlyRate.
Pass 1: Update HourlyRate to be the flow rate for the hour ending with the 'current' row, where HourlyRate is NULL.
Pass 2: Select Max(HourlyRate) within the time period of interest.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 1, 2011 at 7:39 am
Evil Kraig F (11/1/2011)
Heh, I wonder if we're thinking along the same lines Jeff. I'm currently puzzling out a rolling window mechanic that should only need one (maybe two) passes on the source data via mathmatics based on packaging the rows into multiple simultaneous containers and a CASE'd AVG statement.Just can't seem to nail it down in my head though.
I'd have to look for it but someone wrote a wonderful article on such rolling averages. Intead of doing an average over each trailing hour using a bit of a Triangular Join, you calculate the first SUM and COUNT used to make an average and then do the following for each row using a single update...
Do a single lookup for the "trail off" row and substract it from the SUM.
You're already at the "leading" row during an UPDATE... add it to the SUM.
Calculate the average using SUM/COUNT.
This can be done in one of two ways... correlated subquery or Cross Apply (which should very fast because it's an equality) or an offset self join (which I think might be a bit slower).
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2011 at 12:35 pm
Jeff Moden (11/1/2011)
Do a single lookup for the "trail off" row and substract it from the SUM.You're already at the "leading" row during an UPDATE... add it to the SUM.
Calculate the average using SUM/COUNT.
My personal test bed cause an overflow when I thought about that pattern, but you're right, that'll depend heavily on the base data.
EDIT: Oh, yeah, and I didn't want to fight with ranges inside each of the increments, a simple FIFO push stack wasn't going to easily handle ranges.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 1, 2011 at 4:05 pm
Brian Ki (10/31/2011)
yes, Rolling Window describes it well - never heard of that. Yes, we can limit it to a day, and yes, I should have put 2:02:14 to 3:02:14.
Let me ask again... what do you expect for a "normal" min and max instantaneous flow rate? I'd like to make the test data reasonably close to the real thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply