Query Help - Sum by Month

Question

Post reply

Query Help - Sum by Month

Sean Grebey

Ten Centuries

Points: 1159
More actions
August 5, 2013 at 8:48 am

#278319

I have a table of values by date. I have a requirement to group and sum them by month (to use to plot on a chart). I'm not really sure how to go about grouping by a range. That is if I have:
01/03/2012 $300
01/12/2012 $250
02/05/2012 $200
02/07/2012 $300
02/15/2012 $400
I need it grouped as two rows
01/2012 $550
02/2012 $900
Thanks for any help.
Sean

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

DDL and sample data would have made this a lot easier. 🙂

Basically this is a normal group by situation except that you need to do some date math to group the months together.

This should do it.

create table #Something

(

MyDate datetime,

SomeValue int

)

insert #Something

select '01/03/2012', 300 union all

select '01/12/2012', 250 union all

select '02/05/2012', 200 union all

select '02/07/2012', 300 union all

select '02/15/2012', 400

select dateadd(mm, datediff(mm, 0, MyDate), 0), sum(SomeValue)

from #Something

group by dateadd(mm, datediff(mm, 0, MyDate), 0)

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

ET DATEFORMAT MDY

SELECT

DateRange = RIGHT('0'+CAST(MONTH([Date]) AS VARCHAR(2)),2)+'/'+

CAST(YEAR([Date]) AS VARCHAR(4)),

SUMAmount = SUM(Amount)

FROM (

SELECT '01/03/2012', $300 UNION ALL

SELECT '01/12/2012', $250 UNION ALL

SELECT '02/05/2012', $200 UNION ALL

SELECT '02/07/2012', $300 UNION ALL

SELECT '02/15/2012', $400

) d ([Date], Amount)

GROUP BY YEAR([Date]), MONTH([Date])

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Sean Grebey Ten Centuries Points: 1159 More actions · Answer 3

Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?

select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)

From Reporting.dbo.PrgSum

Group by YEAR(ContractDate), MONTH(ContractDate)

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 4

Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)

Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.

Jason Wolfkill

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

wolfkillj (8/7/2013)
Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.

Wouldn't a simple convert be better than multiple functions for that?

SELECT

DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),

SUMAmount = SUM(Amount)

FROM (

SELECT CONVERT(date, '20120103'), $300 UNION ALL

SELECT '20120112', $250 UNION ALL

SELECT '20120205', $200 UNION ALL

SELECT '20120207', $300 UNION ALL

SELECT '20120215', $400

) d ([Date], Amount)

GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Luis Cazares (8/7/2013)
wolfkillj (8/7/2013)
Sean Grebey (8/6/2013)
Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?
select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)
From Reporting.dbo.PrgSum
Group by YEAR(ContractDate), MONTH(ContractDate)
Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.
Wouldn't a simple convert be better than multiple functions for that?
SELECT
DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),
SUMAmount = SUM(Amount)
FROM (
SELECT CONVERT(date, '20120103'), $300 UNION ALL
SELECT '20120112', $250 UNION ALL
SELECT '20120205', $200 UNION ALL
SELECT '20120207', $300 UNION ALL
SELECT '20120215', $400
) d ([Date], Amount)
GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)

The plans for both show a sort (100% of cost) to support the streaming aggregate. Isn't it your turn to scale up a million-row test, Luis? 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

After several tests, It became clear that Chris query is 1.5 times faster, at least on my development server. Although, both run in about one second for a million rows. Here's the test code I used. If someone can clean the buffer before each query would be better, but I can't do it in here.

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable

;

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartDate DATETIME,

@EndDate DATETIME,

@Days INT, --This is still the "range"

@StartValue INT,

@EndValue INT,

@Range INT

;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000,

@StartDate = '2010', --Inclusive

@EndDate = '2020', --Exclusive

@Days = DATEDIFF(dd,@StartDate,@EndDate),

@StartValue = 100,

@EndValue = 900,

@Range = @EndValue - @StartValue + 1

;

--===== Create "random constrained" integers within

-- the parameters identified in the variables above.

SELECT TOP (@NumberOfRows)

dDate = ABS(CHECKSUM(NEWID())) % @Days + @StartDate,

Amount = CAST( ROUND( RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))

INTO #SomeTestTable

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

SET STATISTICS TIME ON

SELECT

DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

CAST(YEAR([dDate]) AS VARCHAR(4)),

SUMAmount = SUM(Amount)

INTO #Dummy

FROM #SomeTestTable

GROUP BY YEAR([dDate]), MONTH([dDate])

SELECT

DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),

SUMAmount = SUM(Amount)

INTO #Dummy2

FROM #SomeTestTable

GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)

SET STATISTICS TIME OFF

DROP TABLE #Dummy

DROP TABLE #Dummy2

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Stefan_G SSCertifiable Points: 6609 More actions · Answer 8

A little challenge:

Assume there is a clustered index on the date column.

Can you create a query that exploits the clustered index to achieve even better performance?

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 9

Stefan_G (8/7/2013)
A little challenge:
Assume there is a clustered index on the date column.
Can you create a query that exploits the clustered index to achieve even better performance?

Back to you, Stefan. Get a streaming aggregate *without* a sort. Should be straightforward if your table is clustered on date 😉

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

Stefan_G (8/7/2013)
A little challenge:
Assume there is a clustered index on the date column.
Can you create a query that exploits the clustered index to achieve even better performance?

Borrowing from Luis' test script;

Create a million-row test table

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable;

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartDate DATEtime,

@EndDate DATEtime,

@Days DECIMAL(10,2), --This is still the "range"

@StartValue INT,

@EndValue INT,

@Range INT;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000,

@StartDate = '2010', --Inclusive

@EndDate = '2020', --Exclusive

@Days = DATEDIFF(dd,@StartDate,@EndDate),

@StartValue = 100,

@EndValue = 900,

@Range = @EndValue - @StartValue + 1;

--===== Create "random constrained" integers within

-- the parameters identified in the variables above.

SELECT TOP (@NumberOfRows)

dDate = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate),

--dDate = DATEADD(minute,ABS(CHECKSUM(NEWID())) % 1440,DATEADD(day,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)),

Amount = CAST(ROUND(RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))

INTO #SomeTestTable

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2;

CREATE CLUSTERED INDEX ucx_dDate ON #SomeTestTable (dDate)

Set up a test script

--

-- Q1

dbcc freeproccache

dbcc dropcleanbuffers

SELECT

DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

CAST(YEAR([dDate]) AS VARCHAR(4)),

SUMAmount = SUM(Amount)

FROM #SomeTestTable

GROUP BY YEAR([dDate]), MONTH([dDate])

go 10

--

-- Q2

dbcc freeproccache

dbcc dropcleanbuffers

SELECT

DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),

SUMAmount = SUM(Amount)

FROM #SomeTestTable

GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)

go 10

Open profiler, restrict to current spid & EventClass = 'SQL:BatchCompleted'. Run the test script. Here are the results:

Query CPU Reads Duration (ms)

Q1 668 3729 408

Q2 1217 3729 689

Notice that the plans for the two queries are very similar and have an equal cost relative to the batch

Both show a hash aggregate. You might expect a streaming aggregate since the table is clustered on date and we’re grouping by date elements, but SQL Server cannot (yet) perform a streaming aggregate based on elements of date. The hash table for the aggregate requires memory and each query has a memory grant of 3392KB.

The sample data set consists of 10 years’ worth of dates each duplicated about 275 times. A high ratio of source rows to result rows suggests a preaggregate step might offer some improvement:

SELECT

dDate = dDate,

SUMAmount = SUM(Amount)

FROM #SomeTestTable

GROUP BY dDate

This returns 3652 rows with a CPU of 340 and duration of about 380ms. A quick look at the plan shows a streaming aggregate as you would expect, and no memory grant. Substituting the table name with this query in my original query gives this jobbie;

SELECT

DateRange = RIGHT('0'+

CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

CAST(YEAR([dDate]) AS VARCHAR(4)),

SUMAmount = SUM(SUMAmount)

FROM (

SELECT

dDate = dDate,

SUMAmount = SUM(Amount)

FROM #SomeTestTable

GROUP BY dDate

) s

GROUP BY YEAR([dDate]), MONTH([dDate]) with a CPU and duration almost identical to the subquery alone. The 3652 rows from the inner select are processed by a second streaming aggregate requiring a sort and consequent memory grant – this time a more modest 1040KB.

Job's a good'un.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Stefan_G SSCertifiable Points: 6609 More actions · Answer 11

Nice!

My solution looks like this:

;with cte1 as (

-- get all interesting months

select dateadd(month, N, '20100101') FirstDay

from tsqlc_Tally

where N < 120

)

select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount

from cte1

cross apply (

select sum(Amount) as SumAmount from #SomeTestTable where dDate >= FirstDay and dDate < dateadd(month, 1, FirstDay)

) t

tsqlc_Tally is a tally table

For the random sample data both solutions have almost identical execution times on my machine.

My solution would be better if the date field for example was an actual timestamp with millisecond resolution. This would mean that the your preaggregation would not be very useful.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

That's well sneaky Stefan! This mod should make it quicker, too:

;with cte1 as (

-- get all interesting months

select

FirstDay = dateadd(month, N, '20100101'),

qq = dateadd(month, 1, dateadd(month, N, '20100101'))

from Tally0

where N < 120

)

select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount

from cte1

cross apply (

select sum(Amount) as SumAmount

from #SomeTestTable

where dDate >= FirstDay

and dDate < qq

) t

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden