February 11, 2013 at 5:51 am
Hello,
I have searched a number of forums and tried a few examples but cannot get anything to work. I hope i include enough information to be helpful.
Please can someone advise how i can accomplish the following:
==========
existing table with format
RUNID; TIMEFRAME_START; QUEUE_NAME; ATSTART_MAILS; REPLIED_MAILS
1; 2012-02-10 00:00:00.000; QUEUE1; 25; 15
1; 2012-02-10 01:00:00.000; QUEUE1; 23; 14
1; 2012-02-10 02:00:00.000; QUEUE1; 21; 11
>>>>>
1; 2012-02-10 22:00:00.000; QUEUE1; 14; 9
1; 2012-02-10 23:00:00.000; QUEUE1; 19; 17
2; 2012-02-11 00:00:00.000; QUEUE1; 22; 13
2; 2012-02-11 01:00:00.000; QUEUE1; 21; 11
etc, etc, etc
i need to run some SQL daily to output results for the last day in the following format:
TIMEFRAME_START; QUEUE_NAME; ATSTART_MAILS; REPLIED_MAILS
2012-02-10 00:00:00.000; QUEUE1; 25; 15
2012-02-10 01:00:00.000; QUEUE1; 48; 29
2012-02-10 02:00:00.000; QUEUE1; 69; 40
etc, etc, etc
i have tried using CTE with a <= inner join but keep getting huge results (i narrowed it down to 1 queue for easy analysis of results....and will need to add more columns once the base code is sorted)
with data as
(
select TIMEFRAME_START, QUEUE_NAME, ATSTART_MAILS, REPLIED_MAILS from EGTS_OUT_EMAIL_ACTIVITY_HOURLY
WHERE RUNID=(SELECT MAX(RUNID) FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY)
)
SELECT
t.TIMEFRAME_START, t.queue_name,
(SELECT SUM(atstart_mails) FROM data WHERE data.TIMEFRAME_START<=t.TIMEFRAME_START),
(SELECT sum(replied_mails) FROM data WHERE data.TIMEFRAME_START<=t.TIMEFRAME_START)
FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY t
where RUNID=(SELECT MAX(RUNID) FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY)
and t.QUEUE_NAME='5 UK GOODS'
group by t.timeframe_start, t.QUEUE_NAME
order by TIMEFRAME_START asc
i know the subqueries are bad :p
i think its something to do with the <= join not using the RUNID clause somehow but not sure as i added it on both sides....
many thanks
samuel
February 11, 2013 at 5:59 am
Could you please provide DDL and insert sample data script? Tips how to do so are in the link at the bottom of my signature.
February 11, 2013 at 5:59 am
ignore me....the code works!!
im an idiot and forgot to reference properly!!
thanks anyway :p
with data as
(
select TIMEFRAME_START, QUEUE_NAME, ATSTART_MAILS, REPLIED_MAILS from EGTS_OUT_EMAIL_ACTIVITY_HOURLY
WHERE RUNID=(SELECT MAX(RUNID) FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY)
and QUEUE_NAME='5 UK GOODS'
)
SELECT
t.TIMEFRAME_START, t.queue_name,
(SELECT SUM(atstart_mails) FROM data WHERE data.TIMEFRAME_START<=t.TIMEFRAME_START),
(SELECT sum(replied_mails) FROM data WHERE data.TIMEFRAME_START<=t.TIMEFRAME_START)
FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY t
where t.RUNID=(SELECT MAX(RUNID) FROM EGTS_OUT_EMAIL_ACTIVITY_HOURLY)
and t.QUEUE_NAME='5 UK GOODS'
group by t.timeframe_start, t.QUEUE_NAME
order by t.TIMEFRAME_START asc
February 11, 2013 at 6:06 am
You might wnat to revisit the code, I would suggest reading this article http://www.sqlservercentral.com/articles/T-SQL/61539/ and then look at a turning this into a Quirky update for the Running total, something like this http://www.sqlservercentral.com/articles/T-SQL/68467/
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 11, 2013 at 6:12 am
Jason-299789 (2/11/2013)
You might wnat to revisit the code, I would suggest reading this article http://www.sqlservercentral.com/articles/T-SQL/61539/ and then look at a turning this into a Quirky update for the Running total, something like this http://www.sqlservercentral.com/articles/T-SQL/68467/
The OP's query uses triangular joins, usually a no-no because of the potential for huge working sets. However, if partitions are small, it can perform reasonably well. The easiest way to calculate running totals is with a recursive CTE as follows:
;WITH SampleData (RUNID, TIMEFRAME_START, QUEUE_NAME, ATSTART_MAILS, REPLIED_MAILS)AS (
SELECT 1, '2012-02-10 00:00:00.000', 'QUEUE1', 25, 15 UNION ALL
SELECT 1, '2012-02-10 01:00:00.000', 'QUEUE1', 23, 14 UNION ALL
SELECT 1, '2012-02-10 02:00:00.000', 'QUEUE1', 21, 11 UNION ALL
SELECT 1, '2012-02-10 22:00:00.000', 'QUEUE1', 14, 9 UNION ALL
SELECT 1, '2012-02-10 23:00:00.000', 'QUEUE1', 19, 17 UNION ALL
SELECT 2, '2012-02-11 00:00:00.000', 'QUEUE1', 22, 13 UNION ALL
SELECT 2, '2012-02-11 01:00:00.000', 'QUEUE1', 21, 11 )
, SequencedData AS (
SELECT
Seq = ROW_NUMBER() OVER(ORDER BY RUNID, TIMEFRAME_START),
*
FROM SampleData
),
Calculator AS (
SELECT Seq, RUNID, TIMEFRAME_START, QUEUE_NAME, ATSTART_MAILS, REPLIED_MAILS
FROM SequencedData
WHERE Seq = 1
UNION ALL
SELECT nr.Seq, nr.RUNID, nr.TIMEFRAME_START, nr.QUEUE_NAME,
ATSTART_MAILS = CASE WHEN nr.RUNID = lr.RUNID THEN nr.ATSTART_MAILS+lr.ATSTART_MAILS ELSE nr.ATSTART_MAILS END,
REPLIED_MAILS = CASE WHEN nr.RUNID = lr.RUNID THEN nr.REPLIED_MAILS+lr.REPLIED_MAILS ELSE nr.REPLIED_MAILS END
FROM Calculator lr -- last row
INNER JOIN SequencedData nr -- new row
ON nr.Seq = lr.Seq+1
)
SELECT *
FROM Calculator
The fastest way to calculate running totals is the QU method referenced by Jason.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 11, 2013 at 6:50 am
ive got over 500 queues and 15 extra data columns to compute also....would those links scale up??
ive got this running in under 20 seconds....ill look into those other methods
many thanks
February 11, 2013 at 6:52 am
Chris,
I've never seen the recursive CTE method before, and looks like a viable option to the quirky update to a point.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 11, 2013 at 6:57 am
lilywhites (2/11/2013)
ive got over 500 queues and 15 extra data columns to compute also....would those links scale up??ive got this running in under 20 seconds....ill look into those other methods
many thanks
The QU method scales well and will take about 5 seconds to UPDATE a million rows (on what? Most folks' testing rig). The rCTE method also scales well and takes about 25 seconds to SELECT a million rows (plus network time). There are a number of other methods, all of which are slower. The triangular join method you are using usually scales particularly badly, because as the set size increases, so does the number of rows to be aggregated. In your case the data is partitioned which helps.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 11, 2013 at 7:00 am
Jason-299789 (2/11/2013)
Chris,I've never seen the recursive CTE method before, and looks like a viable option to the quirky update to a point.
Hi Jason, it's been subjected to a significant amount of intermittent testing here on ssc, have a Goooogle.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 11, 2013 at 7:03 am
ill check out the rCTE method for the final phase as there may well be expansion in the data set in the future....as a proof of concept this will have to do for now 🙁
thanks for your help
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply