Grouping on month

Question

Post reply

Grouping on month

keymoo

SSCrazy

Points: 2975
More actions
November 5, 2011 at 5:12 am

#247461

Hi
Consider this table of account balances:
use tempdb
go
CREATE TABLE balances (
transactionId INT PRIMARY KEY,
dt datetime,
balance money
)
go
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(1,'Sep 5 2011 12:00:00:000AM',43000.0000)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(2,'Sep 5 2011 12:01:00:000AM',42355.0000)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(3,'Sep 9 2011 8:56:00:000AM',42269.0000)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(4,'Sep 9 2011 3:24:14:000PM',42110.7600)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(5,'Sep 12 2011 11:44:33:000AM',42712.7600)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(6,'Sep 15 2011 6:12:01:000PM',41637.7600)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(7,'Sep 16 2011 3:29:06:000PM',40777.7600)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(8,'Sep 19 2011 6:50:00:000PM',42132.2600)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(9,'Sep 20 2011 8:00:00:000PM',42129.6800)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(10,'Sep 21 2011 5:18:26:000PM',43651.8800)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(11,'Sep 21 2011 6:23:00:000PM',44959.5100)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(12,'Sep 21 2011 8:00:00:000PM',44951.3400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(13,'Sep 25 2011 10:09:01:000PM',45166.3400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(14,'Sep 26 2011 6:54:26:000AM',46241.3400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(15,'Sep 26 2011 9:39:15:000AM',47875.3400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(16,'Sep 27 2011 3:38:33:000PM',49219.5200)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(17,'Sep 29 2011 4:34:18:000PM',52552.0200)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(18,'Oct 3 2011 10:36:54:000AM',53842.0200)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(19,'Oct 4 2011 8:13:08:000PM',50591.2200)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(20,'Oct 10 2011 8:00:00:000PM',50549.0800)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(21,'Oct 11 2011 2:57:22:000PM',50936.0800)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(22,'Oct 11 2011 8:00:00:000PM',50906.8400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(23,'Oct 13 2011 5:31:54:000PM',53435.2400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(24,'Oct 17 2011 6:17:58:000PM',52145.2400)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(25,'Oct 17 2011 8:00:00:000PM',52126.3200)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(26,'Oct 18 2011 5:44:22:000PM',56660.6700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(27,'Oct 19 2011 12:13:46:000AM',58436.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(28,'Oct 20 2011 9:25:54:000AM',54781.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(29,'Oct 20 2011 7:30:42:000PM',56415.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(30,'Oct 21 2011 6:28:13:000AM',63725.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(31,'Oct 21 2011 1:56:48:000PM',56716.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(32,'Oct 21 2011 8:05:57:000PM',54050.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(33,'Oct 26 2011 9:16:11:000AM',56458.5700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(34,'Oct 26 2011 10:33:48:000PM',59477.1700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(35,'Oct 27 2011 3:58:23:000AM',53887.1700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(36,'Oct 27 2011 12:29:41:000PM',51565.1700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(37,'Oct 27 2011 1:29:03:000PM',47114.6700)
INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(38,'Nov 1 2011 7:24:00:000AM',41309.6700)
I want to return, for each month, the starting balance, the ending balance and the percent change. It's a little trickier than I first thought because for each month I need to find the balance of the max date of that month and the balance of the max date of the previous month and then calculate the % change.
Any ideas?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Based on your sample data, what would be the expected result?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.

http://www.sqlservercentral.com/articles/T-SQL/68467/

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 3

bitbucket-25253 (11/5/2011)
Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.
http://www.sqlservercentral.com/articles/T-SQL/68467/

Ron, I don't think it's a running total scenario. It seems more like using the max data per month. But since I wasn't sure, I asked for the expected result...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

keymoo SSCrazy Points: 2975 More actions · Answer 4

LutzM (11/5/2011)
Based on your sample data, what would be the expected result?

Hi Lutz,

The expected result is

MONTH startBalance endBalance pctChange

============== ============ ========== =========

September 2011 43000.00 52552.02 22.21

October 2011 52552.02 47114.67 -10.35

November 2011 47114.67 41309.67 -12.32

Thanks for looking!

Ron McCullough SSC Guru Points: 63877 More actions · Answer 5

LutzM (11/5/2011)
bitbucket-25253 (11/5/2011)
Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.
http://www.sqlservercentral.com/articles/T-SQL/68467/
Ron, I don't think it's a running total scenario. It seems more like using the max data per month. But since I wasn't sure, I asked for the expected result...

Oops on my part. I jumped to a an assumption, and well, you know what that makes of me

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

JonFox SSCrazy Points: 2157 More actions · Answer 6

Here's my first stab at it:

WITH BalanceDates

AS

(

SELECT CurMonthEnd, PrevMonthEnd, MIN(dt) AS FirstTranDate, MAX(dt) AS LastTranDate

FROM (

SELECT DATEADD(MM,DATEDIFF(MM, 0, dt), -1) AS PrevMonthEnd,

DATEADD(MM,DATEDIFF(MM, 0, dt)+1, -1) AS CurMonthEnd,

dt

FROM balances

) AS b

GROUP BY CurMonthEnd, PrevMonthEnd

),

MonthlyBalances

AS

(

SELECT CurMonthEnd, PrevMonthEnd, bmin.balance AS StartingBalance, bmax.balance AS EndingBalance

FROM BalanceDates AS b INNER JOIN

balances AS bmin ON b.FirstTranDate = bmin.dt INNER JOIN

balances AS bmax ON b.LastTranDate = bmax.dt

)

SELECT DATENAME(MM,curr.CurMonthEnd) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,curr.CurMonthEnd)) AS CalendarMonth,

curr.StartingBalance, curr.EndingBalance,

(curr.EndingBalance - COALESCE(prev.EndingBalance,curr.StartingBalance))/COALESCE(prev.EndingBalance,curr.StartingBalance)*100 AS PctChange

FROM MonthlyBalances AS curr LEFT OUTER JOIN

MonthlyBalances AS prev ON curr.PrevMonthEnd = prev.CurMonthEnd

I doubt this is the most efficient way of doing it though, as it results in the balances table being read six times...

EDIT: Forgot to deal with the precision issues when calculating the PctChange column...will post fix later when I have time, on my way out the door at the moment...

LutzM SSC Guru Points: 107049 More actions · Answer 7

Here's a slightly different approach based on the CROSS APPLY method.

One thing making the code more complex is the "interpretation" of the very first balance being the start balance of the month whereas for all other month the latest value of the prev. month is used....

;

-- build a table including the first value with a date of the prev. month.

WITH cte AS

(

SELECT dt,balance FROM [balances]

UNION ALL

SELECT TOP 1 DATEADD(mm,DATEDIFF(mm,0,dt),0)-1,balance

FROM [balances]

ORDER BY dt

)

SELECT

DATENAME(MM,mnth) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,mnth)) AS CalendarMonth,

x.startBalance,

x2.endBalance,

(x2.endBalance/x.startBalance -1) *100 AS pctChange

FROM

( -- create a pseudo-calendar table

SELECT TOP 5

DATEADD(mm,-number,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS mnth,

DATEADD(mm,-number+1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS nextmnth

FROM master..spt_values

WHERE type ='P'

) cte

-- 1st cross appy to get the start balance

CROSS APPLY

(

SELECT TOP 1 balance AS startBalance

FROM cte

WHERE dt<mnth ORDER BY dt DESC

)x

-- 2nd cross appy to get the end balance

CROSS APPLY

(

SELECT TOP 1 balance AS endBalance

FROM [balances]

WHERE dt<nextmnth ORDER BY dt DESC

)x2

ORDER BY mnth

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 8

I forgot: you might also want to add an index on dt with balance included to improve performance. Otherwise there'll be additional sort operations that should be avoided.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

keymoo SSCrazy Points: 2975 More actions · Answer 9

Awesome Lutz, thanks so much for your help once again! Perfect, elegant solution and thanks for the index tip too.

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 10

You folks know me... I'll be one of the first to use a "Pseudo-Calendar" table if it's warrented. I don't believe it is in this case. We also don't need to have a CTE that adds one row (which looks like there's no guarantee which row that might be in Lutz' code because no MIN and no ORDER BY).

First, let's setup some data for performance testing. A million rows across 12 years should do it... 😉

/******************************************************************************

This section of code builds a shedload of test data and is NOT a part of the

solution. We're just building test data here.

******************************************************************************/

--===== Declare some obviously named variables to control how much data we want

-- and across which years

DECLARE @StartYear DATETIME,

@EndYear DATETIME,

@NextYear DATETIME,

@Rows INT

;

SELECT @StartYear = '2000',

@EndYear = '2011',

@Rows = 1000000,

@NextYear = DATEADD(yy,1,@EndYear)

;

--=============================================================================

-- Create the test data. This is NOT a part of the solution.

-- Usually takes something less than 12 seconds to generate.

--=============================================================================

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#Balances','U') IS NOT NULL

DROP TABLE #Balances

;

--===== Create the table and populate it on the fly.

-- This builds a table with random dates and amounts from @StartYear

-- through @EndYear

SELECT TOP (@Rows)

TransactionID = IDENTITY(INT,1,1),

Dt = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartYear,@NextYear) + CAST(@StartYear AS DATETIME),

Balance = CAST(RAND(CHECKSUM(NEWID()))*100 AS MONEY)

INTO #Balances

FROM master.sys.all_columns ac1

CROSS JOIN master.sys.all_columns ac2

;

--===== Add a clustered index to the table just to keep it from being a heap

-- and to help with speed a bit by sorting the data.

CREATE CLUSTERED INDEX IX_Balances_Dt

ON #Balances (Dt)

;

ALTER TABLE #Balances

ADD PRIMARY KEY NONCLUSTERED (TransactionID)

;

DO notice, please, that I've made my indexing recommendation in the code. I recommend that the PK on the TransactionID be a NONCLUSTERED PK and that the index on the Dt column be a CLUSTERED index.

I believe Lutz' code was setup to only work with the previous 5 months. I've changed the "5" to 144 to work with 12 years of months but that's the only change I made. This isn't a slam on Lutz... a lot of folks wouldn't have even be able to come up with a solution no matter how long it took. ~~My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.~~{EDIT... Holy Moly! Did I ever get THAT backwards. Not sure what was on my mind when I wrote that but I typed that exactly backwards! My apologies. :blush:} I'll let the code do the rest of the talking...

PRINT '========== Lutz Code 01 ================================================='

SET STATISTICS TIME,IO ON;

WITH cte AS

(

SELECT dt,balance FROM #balances

UNION ALL

SELECT TOP 1 DATEADD(mm,DATEDIFF(mm,0,dt),0)-1,balance

FROM #balances

ORDER BY dt

)

SELECT

DATENAME(MM,mnth) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,mnth)) AS CalendarMonth,

x.startBalance,

x2.endBalance,

(x2.endBalance/x.startBalance -1) *100 AS pctChange

FROM

( -- create a pseudo-calendar table

SELECT

DATEADD(mm,-number,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS mnth,

DATEADD(mm,-number+1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS nextmnth

FROM master..spt_values

WHERE type ='P'

AND Number < 144

) cte

-- 1st cross appy to get the start balance

CROSS APPLY

(

SELECT TOP 1 balance AS startBalance

FROM cte

WHERE dt<mnth ORDER BY dt DESC

)x

-- 2nd cross appy to get the end balance

CROSS APPLY

(

SELECT TOP 1 balance AS endBalance

FROM [#balances]

WHERE dt<nextmnth ORDER BY dt DESC

)x2

ORDER BY mnth

;

SET STATISTICS TIME,IO OFF;

PRINT '========== Moden Code 01 ================================================'

SET STATISTICS TIME,IO ON;

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

DROP TABLE #WorkTable

;

--===== Supress the auto-display of rowcounts

SET NOCOUNT ON

;

--===== Solve the first part of the problem... get the correct dates

-- and the Balances for those dates. We stuff all of this into

-- a temp table (144 rows) because we want to do a self join later.

-- If we do that on a CTE, the CTE will run twice.

SELECT RowNum = IDENTITY(INT,1,1),

StartDate = dates.StartDate,

Balance = bal.Balance

INTO #WorkTable

FROM (

SELECT MIN(Dt)

FROM #Balances

-- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

UNION ALL

SELECT MAX(Dt)

FROM #Balances

-- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

GROUP BY DATEDIFF(mm,0,Dt)

) dates (StartDate)

INNER JOIN #Balances bal

ON dates.StartDate = bal.Dt

ORDER BY dates.StartDate

;

--===== Do an "offset" self-join so se can get starting and ending balances

-- and do the necessary calulations for final display.

SELECT CalendarMonth = DATENAME(mm,hi.StartDate)+' '+DATENAME(yy,hi.StartDate),

StartingBalance = lo.Balance,

EndingBalance = hi.Balance,

PercentChange = (hi.Balance/lo.Balance -1) * 100

FROM #WorkTable lo

INNER JOIN #WorkTable hi

ON lo.RowNum+1 = hi.RowNum

ORDER BY lo.RowNum

SET STATISTICS TIME,IO ON;

Notice that I violated the rules of "SARGability" with "GROUP BY DATEDIFF(mm,0,Dt)" and it DOES cause an INDEX SCAN. My code should be a lot slower because of that, right? Here are the results... we don't always get this lucky with non-SARGable code but you just don't know until you try on a large amount of data.

========== Lutz Code 01 =================================================

Table '#Balances___________________________________________________________________________________________________________000000000027'.

[font="Arial Black"]Scan count 431, logical reads 254859,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. [font="Arial Black"]Scan count 288, logical reads 1149,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'spt_values'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

[font="Arial Black"] CPU time = 69860 ms, elapsed time = 73053 ms.[/font]

========== Moden Code 01 ================================================

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Table '#Balances___________________________________________________________________________________________________________000000000027'.

[font="Arial Black"] Scan count 146, logical reads 2722,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

[font="Arial Black"] CPU time = 1625 ms, elapsed time = 1741 ms.[/font]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#WorkTable__________________________________________________________________________________________________________000000000029'.

[font="Arial Black"]Scan count 2, logical reads 4,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

[font="Arial Black"] CPU time = 0 ms, elapsed time = 19 ms.[/font]

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 11

Jeff Moden (11/6/2011)
My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.

The query plan I get looks like this (just the important part shown):

Perhaps we get different plans, anyway my guess is you were looking to show that for a MAX or a MIN, a scan of an index might start at one end or the other and just return one row (still a scan, not a seek). Query plans do not lie, in general, but the quality of the interpretation does vary a bit 🙂

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

JonFox SSCrazy Points: 2157 More actions · Answer 12

That's great stuff Jeff! I love this forum, I keep learning better ways to do things. 🙂

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 13

SQL Kiwi (11/6/2011)
Jeff Moden (11/6/2011)
My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.
The query plan I get looks like this (just the important part shown):
Perhaps we get different plans, anyway my guess is you were looking to show that for a MAX or a MIN, a scan of an index might start at one end or the other and just return one row (still a scan, not a seek). Query plans do not lie, in general, but the quality of the interpretation does vary a bit 🙂

How VERY embarrassing. :blush: Thanks for the catch, Paul. I typed that exactly backwards. The whole purpose of my post was to show that INDEX SCANS, even CLUSTERED INDEX SCANS (the same table scans, in reality) aren't the proverbial "Boogy-Man" that a lot of folks think and that, sometimes they can't be nor should be necessarily avoided. That'll teach me for trying to write something at 2:38 in the morning.

I went back and corrected the previous post and the following is more of what I was trying to get at.

Let's use DBCC TIMEWARP to go back in time and see what I was talking about and why I say the execution plan lies (or, in Paul's words, can be misinterpreted). Here's the code I originally used to build the data. It has the CLUSTERED index on the "Dt" column, but not the PK constraint/index.

/******************************************************************************

This section of code builds a shedload of test data and is NOT a part of the

solution. We're just building test data here.

******************************************************************************/

--===== Declare some obviously name variables to control how much data we want

-- and across which years

DECLARE @StartYear DATETIME,

@EndYear DATETIME,

@NextYear DATETIME,

@Rows INT

;

SELECT @StartYear = '2000',

@EndYear = '2011',

@Rows = 1000000,

@NextYear = DATEADD(yy,1,@EndYear)

;

--=============================================================================

-- Create the test data. This is NOT a part of the solution.

-- Usually takes something less than 12 seconds to generate.

--=============================================================================

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#Balances','U') IS NOT NULL

DROP TABLE #Balances

;

--===== Create the table and populate it on the fly.

-- This builds a table with random dates and amounts from @StartYear

-- through @EndYear

SELECT TOP (@Rows)

TransactionID = IDENTITY(INT,1,1),

Dt = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartYear,@NextYear) + CAST(@StartYear AS DATETIME),

Balance = CAST(RAND(CHECKSUM(NEWID()))*100 AS MONEY)

INTO #Balances

FROM master.sys.all_columns ac1

CROSS JOIN master.sys.all_columns ac2

;

--===== Add a clustered index to the table just to keep it from being a heap

-- and to help with speed a bit by sorting the data.

CREATE CLUSTERED INDEX IX_Balances_Dt

ON #Balances (Dt)

;

Then, if we run the first part of the solution code to find the correct dates...

PRINT '========== Moden Code 01 ================================================'

SET STATISTICS TIME,IO ON;

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

DROP TABLE #WorkTable

;

--===== Supress the auto-display of rowcounts

SET NOCOUNT ON

;

--===== Solve the first part of the problem... get the correct dates

-- and the Balances for those dates. We stuff all of this into

-- a temp table (144 rows) because we want to do a self join later.

-- If we do that on a CTE, the CTE will run twice.

SELECT RowNum = IDENTITY(INT,1,1),

StartDate = dates.StartDate,

Balance = bal.Balance

INTO #WorkTable

FROM (

SELECT MIN(Dt)

FROM #Balances

-- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

UNION ALL

SELECT MAX(Dt)

FROM #Balances

-- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

GROUP BY DATEDIFF(mm,0,Dt)

) dates (StartDate)

INNER JOIN #Balances bal

ON dates.StartDate = bal.Dt

ORDER BY dates.StartDate

;

.. we get the following execution plan (just the important part shown):

Notice the CLUSTERED INDEX SCAN on the #Balances table in the graphic above. On a million row table, that will normally concern a fair number of people (including me, initially) and that will drive them (including me, initially :blush:) to trying something more sophisticated (which isn't actually a bad thing... gotta know which way is best).

We CAN get the CLUSTERED INDEX SCAN to change to a CLUSTERED INDEX SEEK on the execution plan by simply uncommenting the "WHERE Dt > 0" lines of code...

PRINT '========== Moden Code 01 ================================================'

SET STATISTICS TIME,IO ON;

--===== Conditionally drop the test table to make reruns easier.

IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

DROP TABLE #WorkTable

;

--===== Supress the auto-display of rowcounts

SET NOCOUNT ON

;

--===== Solve the first part of the problem... get the correct dates

-- and the Balances for those dates. We stuff all of this into

-- a temp table (144 rows) because we want to do a self join later.

-- If we do that on a CTE, the CTE will run twice.

SELECT RowNum = IDENTITY(INT,1,1),

StartDate = dates.StartDate,

Balance = bal.Balance

INTO #WorkTable

FROM (

SELECT MIN(Dt)

FROM #Balances

WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

UNION ALL

SELECT MAX(Dt)

FROM #Balances

WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

GROUP BY DATEDIFF(mm,0,Dt)

) dates (StartDate)

INNER JOIN #Balances bal

ON dates.StartDate = bal.Dt

ORDER BY dates.StartDate

;

That will give us a CLUSTERED INDEX SEEK instead of the scan we got before.

But, is it really a SEEK? Yeah... according to SQL Server, it did a SEEK. But only to find the "first" row with a date greater than "0" (1900-01-01) and then it reverts to a SCAN behind the scenes. It still has to read all 1 million rows to do the job. If you look at the properties of the SEEK, you can see it uses RANGE SCAN instead of PREFIX like it does on the other INDEX SEEK for the #Balances table.

That's what I meant by the "execution plan lies" and as Paul said... it's not normally that the execution plan that lies, it's how some interpret it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 14

JonFox (11/6/2011)
That's great stuff Jeff! I love this forum, I keep learning better ways to do things. 🙂

Thanks Jon, but I have to really stop typing at 2 in the morning... I typed what I wanted to say exactly backwards. :blush: To make matters worse, I added the PK as a "final thought" and that made it impossible for anyone to follow what I was saying even if I had managed to type it correctly.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)