August 5, 2013 at 8:48 am
I have a table of values by date. I have a requirement to group and sum them by month (to use to plot on a chart). I'm not really sure how to go about grouping by a range. That is if I have:
01/03/2012 $300
01/12/2012 $250
02/05/2012 $200
02/07/2012 $300
02/15/2012 $400
I need it grouped as two rows
01/2012 $550
02/2012 $900
Thanks for any help.
Sean
August 5, 2013 at 9:03 am
DDL and sample data would have made this a lot easier. 🙂
Basically this is a normal group by situation except that you need to do some date math to group the months together.
This should do it.
create table #Something
(
MyDate datetime,
SomeValue int
)
insert #Something
select '01/03/2012', 300 union all
select '01/12/2012', 250 union all
select '02/05/2012', 200 union all
select '02/07/2012', 300 union all
select '02/15/2012', 400
select dateadd(mm, datediff(mm, 0, MyDate), 0), sum(SomeValue)
from #Something
group by dateadd(mm, datediff(mm, 0, MyDate), 0)
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 5, 2013 at 9:06 am
ET DATEFORMAT MDY
SELECT
DateRange = RIGHT('0'+CAST(MONTH([Date]) AS VARCHAR(2)),2)+'/'+
CAST(YEAR([Date]) AS VARCHAR(4)),
SUMAmount = SUM(Amount)
FROM (
SELECT '01/03/2012', $300 UNION ALL
SELECT '01/12/2012', $250 UNION ALL
SELECT '02/05/2012', $200 UNION ALL
SELECT '02/07/2012', $300 UNION ALL
SELECT '02/15/2012', $400
) d ([Date], Amount)
GROUP BY YEAR([Date]), MONTH([Date])
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 6, 2013 at 8:35 am
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
August 7, 2013 at 8:53 am
Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.
Jason Wolfkill
August 7, 2013 at 9:24 am
wolfkillj (8/7/2013)
Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.
Wouldn't a simple convert be better than multiple functions for that?
SELECT
DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),
SUMAmount = SUM(Amount)
FROM (
SELECT CONVERT(date, '20120103'), $300 UNION ALL
SELECT '20120112', $250 UNION ALL
SELECT '20120205', $200 UNION ALL
SELECT '20120207', $300 UNION ALL
SELECT '20120215', $400
) d ([Date], Amount)
GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)
August 7, 2013 at 9:36 am
Luis Cazares (8/7/2013)
wolfkillj (8/7/2013)
Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.
Wouldn't a simple convert be better than multiple functions for that?
SELECT
DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),
SUMAmount = SUM(Amount)
FROM (
SELECT CONVERT(date, '20120103'), $300 UNION ALL
SELECT '20120112', $250 UNION ALL
SELECT '20120205', $200 UNION ALL
SELECT '20120207', $300 UNION ALL
SELECT '20120215', $400
) d ([Date], Amount)
GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)
The plans for both show a sort (100% of cost) to support the streaming aggregate. Isn't it your turn to scale up a million-row test, Luis? 😉
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 7, 2013 at 11:12 am
After several tests, It became clear that Chris query is 1.5 times faster, at least on my development server. Although, both run in about one second for a million rows. Here's the test code I used. If someone can clean the buffer before each query would be better, but I can't do it in here.
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable
;
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartDate DATETIME,
@EndDate DATETIME,
@Days INT, --This is still the "range"
@StartValue INT,
@EndValue INT,
@Range INT
;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartDate = '2010', --Inclusive
@EndDate = '2020', --Exclusive
@Days = DATEDIFF(dd,@StartDate,@EndDate),
@StartValue = 100,
@EndValue = 900,
@Range = @EndValue - @StartValue + 1
;
--===== Create "random constrained" integers within
-- the parameters identified in the variables above.
SELECT TOP (@NumberOfRows)
dDate = ABS(CHECKSUM(NEWID())) % @Days + @StartDate,
Amount = CAST( ROUND( RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
SET STATISTICS TIME ON
SELECT
DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+
CAST(YEAR([dDate]) AS VARCHAR(4)),
SUMAmount = SUM(Amount)
INTO #Dummy
FROM #SomeTestTable
GROUP BY YEAR([dDate]), MONTH([dDate])
SELECT
DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),
SUMAmount = SUM(Amount)
INTO #Dummy2
FROM #SomeTestTable
GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)
SET STATISTICS TIME OFF
DROP TABLE #Dummy
DROP TABLE #Dummy2
August 7, 2013 at 12:37 pm
A little challenge:
Assume there is a clustered index on the date column.
Can you create a query that exploits the clustered index to achieve even better performance?
August 7, 2013 at 1:33 pm
Stefan_G (8/7/2013)
A little challenge:Assume there is a clustered index on the date column.
Can you create a query that exploits the clustered index to achieve even better performance?
Back to you, Stefan. Get a streaming aggregate *without* a sort. Should be straightforward if your table is clustered on date 😉
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
August 8, 2013 at 7:46 am
Stefan_G (8/7/2013)
A little challenge:Assume there is a clustered index on the date column.
Can you create a query that exploits the clustered index to achieve even better performance?
Borrowing from Luis' test script;
Create a million-row test table
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartDate DATEtime,
@EndDate DATEtime,
@Days DECIMAL(10,2), --This is still the "range"
@StartValue INT,
@EndValue INT,
@Range INT;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartDate = '2010', --Inclusive
@EndDate = '2020', --Exclusive
@Days = DATEDIFF(dd,@StartDate,@EndDate),
@StartValue = 100,
@EndValue = 900,
@Range = @EndValue - @StartValue + 1;
--===== Create "random constrained" integers within
-- the parameters identified in the variables above.
SELECT TOP (@NumberOfRows)
dDate = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate),
--dDate = DATEADD(minute,ABS(CHECKSUM(NEWID())) % 1440,DATEADD(day,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)),
Amount = CAST(ROUND(RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2;
CREATE CLUSTERED INDEX ucx_dDate ON #SomeTestTable (dDate)
Set up a test script
--
-- Q1
dbcc freeproccache
dbcc dropcleanbuffers
SELECT
DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+
CAST(YEAR([dDate]) AS VARCHAR(4)),
SUMAmount = SUM(Amount)
FROM #SomeTestTable
GROUP BY YEAR([dDate]), MONTH([dDate])
go 10
--
-- Q2
dbcc freeproccache
dbcc dropcleanbuffers
SELECT
DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),
SUMAmount = SUM(Amount)
FROM #SomeTestTable
GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)
go 10
Open profiler, restrict to current spid & EventClass = 'SQL:BatchCompleted'. Run the test script. Here are the results:
Query CPU Reads Duration (ms)
Q1 668 3729 408
Q2 1217 3729 689
Notice that the plans for the two queries are very similar and have an equal cost relative to the batch
Both show a hash aggregate. You might expect a streaming aggregate since the table is clustered on date and we’re grouping by date elements, but SQL Server cannot (yet) perform a streaming aggregate based on elements of date. The hash table for the aggregate requires memory and each query has a memory grant of 3392KB.
The sample data set consists of 10 years’ worth of dates each duplicated about 275 times. A high ratio of source rows to result rows suggests a preaggregate step might offer some improvement:
SELECT
dDate = dDate,
SUMAmount = SUM(Amount)
FROM #SomeTestTable
GROUP BY dDate
This returns 3652 rows with a CPU of 340 and duration of about 380ms. A quick look at the plan shows a streaming aggregate as you would expect, and no memory grant. Substituting the table name with this query in my original query gives this jobbie;
SELECT
DateRange = RIGHT('0'+
CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+
CAST(YEAR([dDate]) AS VARCHAR(4)),
SUMAmount = SUM(SUMAmount)
FROM (
SELECT
dDate = dDate,
SUMAmount = SUM(Amount)
FROM #SomeTestTable
GROUP BY dDate
) s
GROUP BY YEAR([dDate]), MONTH([dDate]) with a CPU and duration almost identical to the subquery alone. The 3652 rows from the inner select are processed by a second streaming aggregate requiring a sort and consequent memory grant – this time a more modest 1040KB.
Job's a good'un.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 8, 2013 at 9:03 am
Nice!
My solution looks like this:
;with cte1 as (
-- get all interesting months
select dateadd(month, N, '20100101') FirstDay
from tsqlc_Tally
where N < 120
)
select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount
from cte1
cross apply (
select sum(Amount) as SumAmount from #SomeTestTable where dDate >= FirstDay and dDate < dateadd(month, 1, FirstDay)
) t
tsqlc_Tally is a tally table
For the random sample data both solutions have almost identical execution times on my machine.
My solution would be better if the date field for example was an actual timestamp with millisecond resolution. This would mean that the your preaggregation would not be very useful.
August 8, 2013 at 9:32 am
That's well sneaky Stefan! This mod should make it quicker, too:
;with cte1 as (
-- get all interesting months
select
FirstDay = dateadd(month, N, '20100101'),
qq = dateadd(month, 1, dateadd(month, N, '20100101'))
from Tally0
where N < 120
)
select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount
from cte1
cross apply (
select sum(Amount) as SumAmount
from #SomeTestTable
where dDate >= FirstDay
and dDate < qq
) t
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply