Grouping on month

  • Hi

    Consider this table of account balances:

    use tempdb

    go

    CREATE TABLE balances (

    transactionId INT PRIMARY KEY,

    dt datetime,

    balance money

    )

    go

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(1,'Sep 5 2011 12:00:00:000AM',43000.0000)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(2,'Sep 5 2011 12:01:00:000AM',42355.0000)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(3,'Sep 9 2011 8:56:00:000AM',42269.0000)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(4,'Sep 9 2011 3:24:14:000PM',42110.7600)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(5,'Sep 12 2011 11:44:33:000AM',42712.7600)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(6,'Sep 15 2011 6:12:01:000PM',41637.7600)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(7,'Sep 16 2011 3:29:06:000PM',40777.7600)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(8,'Sep 19 2011 6:50:00:000PM',42132.2600)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(9,'Sep 20 2011 8:00:00:000PM',42129.6800)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(10,'Sep 21 2011 5:18:26:000PM',43651.8800)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(11,'Sep 21 2011 6:23:00:000PM',44959.5100)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(12,'Sep 21 2011 8:00:00:000PM',44951.3400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(13,'Sep 25 2011 10:09:01:000PM',45166.3400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(14,'Sep 26 2011 6:54:26:000AM',46241.3400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(15,'Sep 26 2011 9:39:15:000AM',47875.3400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(16,'Sep 27 2011 3:38:33:000PM',49219.5200)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(17,'Sep 29 2011 4:34:18:000PM',52552.0200)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(18,'Oct 3 2011 10:36:54:000AM',53842.0200)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(19,'Oct 4 2011 8:13:08:000PM',50591.2200)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(20,'Oct 10 2011 8:00:00:000PM',50549.0800)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(21,'Oct 11 2011 2:57:22:000PM',50936.0800)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(22,'Oct 11 2011 8:00:00:000PM',50906.8400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(23,'Oct 13 2011 5:31:54:000PM',53435.2400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(24,'Oct 17 2011 6:17:58:000PM',52145.2400)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(25,'Oct 17 2011 8:00:00:000PM',52126.3200)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(26,'Oct 18 2011 5:44:22:000PM',56660.6700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(27,'Oct 19 2011 12:13:46:000AM',58436.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(28,'Oct 20 2011 9:25:54:000AM',54781.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(29,'Oct 20 2011 7:30:42:000PM',56415.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(30,'Oct 21 2011 6:28:13:000AM',63725.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(31,'Oct 21 2011 1:56:48:000PM',56716.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(32,'Oct 21 2011 8:05:57:000PM',54050.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(33,'Oct 26 2011 9:16:11:000AM',56458.5700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(34,'Oct 26 2011 10:33:48:000PM',59477.1700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(35,'Oct 27 2011 3:58:23:000AM',53887.1700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(36,'Oct 27 2011 12:29:41:000PM',51565.1700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(37,'Oct 27 2011 1:29:03:000PM',47114.6700)

    INSERT INTO [balances] ([transactionId],[dt],[balance])VALUES(38,'Nov 1 2011 7:24:00:000AM',41309.6700)

    I want to return, for each month, the starting balance, the ending balance and the percent change. It's a little trickier than I first thought because for each month I need to find the balance of the max date of that month and the balance of the max date of the previous month and then calculate the % change.

    Any ideas?

  • Based on your sample data, what would be the expected result?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • bitbucket-25253 (11/5/2011)


    Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Ron, I don't think it's a running total scenario. It seems more like using the max data per month. But since I wasn't sure, I asked for the expected result...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/5/2011)


    Based on your sample data, what would be the expected result?

    Hi Lutz,

    The expected result is

    MONTH startBalance endBalance pctChange

    ============== ============ ========== =========

    September 2011 43000.00 52552.02 22.21

    October 2011 52552.02 47114.67 -10.35

    November 2011 47114.67 41309.67 -12.32

    Thanks for looking!

  • LutzM (11/5/2011)


    bitbucket-25253 (11/5/2011)


    Have you looked at this article by Jeff Moden. It might give you enough info to solve your problem.

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Ron, I don't think it's a running total scenario. It seems more like using the max data per month. But since I wasn't sure, I asked for the expected result...

    Oops on my part. I jumped to a an assumption, and well, you know what that makes of me

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Here's my first stab at it:

    WITH BalanceDates

    AS

    (

    SELECT CurMonthEnd, PrevMonthEnd, MIN(dt) AS FirstTranDate, MAX(dt) AS LastTranDate

    FROM (

    SELECT DATEADD(MM,DATEDIFF(MM, 0, dt), -1) AS PrevMonthEnd,

    DATEADD(MM,DATEDIFF(MM, 0, dt)+1, -1) AS CurMonthEnd,

    dt

    FROM balances

    ) AS b

    GROUP BY CurMonthEnd, PrevMonthEnd

    ),

    MonthlyBalances

    AS

    (

    SELECT CurMonthEnd, PrevMonthEnd, bmin.balance AS StartingBalance, bmax.balance AS EndingBalance

    FROM BalanceDates AS b INNER JOIN

    balances AS bmin ON b.FirstTranDate = bmin.dt INNER JOIN

    balances AS bmax ON b.LastTranDate = bmax.dt

    )

    SELECT DATENAME(MM,curr.CurMonthEnd) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,curr.CurMonthEnd)) AS CalendarMonth,

    curr.StartingBalance, curr.EndingBalance,

    (curr.EndingBalance - COALESCE(prev.EndingBalance,curr.StartingBalance))/COALESCE(prev.EndingBalance,curr.StartingBalance)*100 AS PctChange

    FROM MonthlyBalances AS curr LEFT OUTER JOIN

    MonthlyBalances AS prev ON curr.PrevMonthEnd = prev.CurMonthEnd

    I doubt this is the most efficient way of doing it though, as it results in the balances table being read six times...

    EDIT: Forgot to deal with the precision issues when calculating the PctChange column...will post fix later when I have time, on my way out the door at the moment...

  • Here's a slightly different approach based on the CROSS APPLY method.

    One thing making the code more complex is the "interpretation" of the very first balance being the start balance of the month whereas for all other month the latest value of the prev. month is used....

    ;

    -- build a table including the first value with a date of the prev. month.

    WITH cte AS

    (

    SELECT dt,balance FROM [balances]

    UNION ALL

    SELECT TOP 1 DATEADD(mm,DATEDIFF(mm,0,dt),0)-1,balance

    FROM [balances]

    ORDER BY dt

    )

    SELECT

    DATENAME(MM,mnth) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,mnth)) AS CalendarMonth,

    x.startBalance,

    x2.endBalance,

    (x2.endBalance/x.startBalance -1) *100 AS pctChange

    FROM

    ( -- create a pseudo-calendar table

    SELECT TOP 5

    DATEADD(mm,-number,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS mnth,

    DATEADD(mm,-number+1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS nextmnth

    FROM master..spt_values

    WHERE type ='P'

    ) cte

    -- 1st cross appy to get the start balance

    CROSS APPLY

    (

    SELECT TOP 1 balance AS startBalance

    FROM cte

    WHERE dt<mnth ORDER BY dt DESC

    )x

    -- 2nd cross appy to get the end balance

    CROSS APPLY

    (

    SELECT TOP 1 balance AS endBalance

    FROM [balances]

    WHERE dt<nextmnth ORDER BY dt DESC

    )x2

    ORDER BY mnth



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I forgot: you might also want to add an index on dt with balance included to improve performance. Otherwise there'll be additional sort operations that should be avoided.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Awesome Lutz, thanks so much for your help once again! Perfect, elegant solution and thanks for the index tip too.

  • You folks know me... I'll be one of the first to use a "Pseudo-Calendar" table if it's warrented. I don't believe it is in this case. We also don't need to have a CTE that adds one row (which looks like there's no guarantee which row that might be in Lutz' code because no MIN and no ORDER BY).

    First, let's setup some data for performance testing. A million rows across 12 years should do it... 😉

    /******************************************************************************

    This section of code builds a shedload of test data and is NOT a part of the

    solution. We're just building test data here.

    ******************************************************************************/

    --===== Declare some obviously named variables to control how much data we want

    -- and across which years

    DECLARE @StartYear DATETIME,

    @EndYear DATETIME,

    @NextYear DATETIME,

    @Rows INT

    ;

    SELECT @StartYear = '2000',

    @EndYear = '2011',

    @Rows = 1000000,

    @NextYear = DATEADD(yy,1,@EndYear)

    ;

    --=============================================================================

    -- Create the test data. This is NOT a part of the solution.

    -- Usually takes something less than 12 seconds to generate.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#Balances','U') IS NOT NULL

    DROP TABLE #Balances

    ;

    --===== Create the table and populate it on the fly.

    -- This builds a table with random dates and amounts from @StartYear

    -- through @EndYear

    SELECT TOP (@Rows)

    TransactionID = IDENTITY(INT,1,1),

    Dt = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartYear,@NextYear) + CAST(@StartYear AS DATETIME),

    Balance = CAST(RAND(CHECKSUM(NEWID()))*100 AS MONEY)

    INTO #Balances

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    --===== Add a clustered index to the table just to keep it from being a heap

    -- and to help with speed a bit by sorting the data.

    CREATE CLUSTERED INDEX IX_Balances_Dt

    ON #Balances (Dt)

    ;

    ALTER TABLE #Balances

    ADD PRIMARY KEY NONCLUSTERED (TransactionID)

    ;

    DO notice, please, that I've made my indexing recommendation in the code. I recommend that the PK on the TransactionID be a NONCLUSTERED PK and that the index on the Dt column be a CLUSTERED index.

    I believe Lutz' code was setup to only work with the previous 5 months. I've changed the "5" to 144 to work with 12 years of months but that's the only change I made. This isn't a slam on Lutz... a lot of folks wouldn't have even be able to come up with a solution no matter how long it took. My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.{EDIT... Holy Moly! Did I ever get THAT backwards. Not sure what was on my mind when I wrote that but I typed that exactly backwards! My apologies. :blush:} I'll let the code do the rest of the talking...

    PRINT '========== Lutz Code 01 ================================================='

    SET STATISTICS TIME,IO ON;

    WITH cte AS

    (

    SELECT dt,balance FROM #balances

    UNION ALL

    SELECT TOP 1 DATEADD(mm,DATEDIFF(mm,0,dt),0)-1,balance

    FROM #balances

    ORDER BY dt

    )

    SELECT

    DATENAME(MM,mnth) + ' ' + CONVERT(CHAR(4),DATEPART(YYYY,mnth)) AS CalendarMonth,

    x.startBalance,

    x2.endBalance,

    (x2.endBalance/x.startBalance -1) *100 AS pctChange

    FROM

    ( -- create a pseudo-calendar table

    SELECT

    DATEADD(mm,-number,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS mnth,

    DATEADD(mm,-number+1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) AS nextmnth

    FROM master..spt_values

    WHERE type ='P'

    AND Number < 144

    ) cte

    -- 1st cross appy to get the start balance

    CROSS APPLY

    (

    SELECT TOP 1 balance AS startBalance

    FROM cte

    WHERE dt<mnth ORDER BY dt DESC

    )x

    -- 2nd cross appy to get the end balance

    CROSS APPLY

    (

    SELECT TOP 1 balance AS endBalance

    FROM [#balances]

    WHERE dt<nextmnth ORDER BY dt DESC

    )x2

    ORDER BY mnth

    ;

    SET STATISTICS TIME,IO OFF;

    PRINT '========== Moden Code 01 ================================================'

    SET STATISTICS TIME,IO ON;

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

    DROP TABLE #WorkTable

    ;

    --===== Supress the auto-display of rowcounts

    SET NOCOUNT ON

    ;

    --===== Solve the first part of the problem... get the correct dates

    -- and the Balances for those dates. We stuff all of this into

    -- a temp table (144 rows) because we want to do a self join later.

    -- If we do that on a CTE, the CTE will run twice.

    SELECT RowNum = IDENTITY(INT,1,1),

    StartDate = dates.StartDate,

    Balance = bal.Balance

    INTO #WorkTable

    FROM (

    SELECT MIN(Dt)

    FROM #Balances

    -- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    UNION ALL

    SELECT MAX(Dt)

    FROM #Balances

    -- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    GROUP BY DATEDIFF(mm,0,Dt)

    ) dates (StartDate)

    INNER JOIN #Balances bal

    ON dates.StartDate = bal.Dt

    ORDER BY dates.StartDate

    ;

    --===== Do an "offset" self-join so se can get starting and ending balances

    -- and do the necessary calulations for final display.

    SELECT CalendarMonth = DATENAME(mm,hi.StartDate)+' '+DATENAME(yy,hi.StartDate),

    StartingBalance = lo.Balance,

    EndingBalance = hi.Balance,

    PercentChange = (hi.Balance/lo.Balance -1) * 100

    FROM #WorkTable lo

    INNER JOIN #WorkTable hi

    ON lo.RowNum+1 = hi.RowNum

    ORDER BY lo.RowNum

    SET STATISTICS TIME,IO ON;

    Notice that I violated the rules of "SARGability" with "GROUP BY DATEDIFF(mm,0,Dt)" and it DOES cause an INDEX SCAN. My code should be a lot slower because of that, right? Here are the results... we don't always get this lucky with non-SARGable code but you just don't know until you try on a large amount of data.

    ========== Lutz Code 01 =================================================

    Table '#Balances___________________________________________________________________________________________________________000000000027'.

    [font="Arial Black"]Scan count 431, logical reads 254859,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. [font="Arial Black"]Scan count 288, logical reads 1149,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'spt_values'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 69860 ms, elapsed time = 73053 ms.[/font]

    ========== Moden Code 01 ================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#Balances___________________________________________________________________________________________________________000000000027'.

    [font="Arial Black"] Scan count 146, logical reads 2722,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 1625 ms, elapsed time = 1741 ms.[/font]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#WorkTable__________________________________________________________________________________________________________000000000029'.

    [font="Arial Black"]Scan count 2, logical reads 4,[/font] physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 0 ms, elapsed time = 19 ms.[/font]

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/6/2011)


    My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.

    The query plan I get looks like this (just the important part shown):

    Perhaps we get different plans, anyway my guess is you were looking to show that for a MAX or a MIN, a scan of an index might start at one end or the other and just return one row (still a scan, not a seek). Query plans do not lie, in general, but the quality of the interpretation does vary a bit 🙂

  • That's great stuff Jeff! I love this forum, I keep learning better ways to do things. 🙂

  • SQL Kiwi (11/6/2011)


    Jeff Moden (11/6/2011)


    My main purpose in this is to show that what SQL Server labels as a SCAN is frequently a SEEK in disguise.

    The query plan I get looks like this (just the important part shown):

    Perhaps we get different plans, anyway my guess is you were looking to show that for a MAX or a MIN, a scan of an index might start at one end or the other and just return one row (still a scan, not a seek). Query plans do not lie, in general, but the quality of the interpretation does vary a bit 🙂

    How VERY embarrassing. :blush: Thanks for the catch, Paul. I typed that exactly backwards. The whole purpose of my post was to show that INDEX SCANS, even CLUSTERED INDEX SCANS (the same table scans, in reality) aren't the proverbial "Boogy-Man" that a lot of folks think and that, sometimes they can't be nor should be necessarily avoided. That'll teach me for trying to write something at 2:38 in the morning.

    I went back and corrected the previous post and the following is more of what I was trying to get at.

    Let's use DBCC TIMEWARP to go back in time and see what I was talking about and why I say the execution plan lies (or, in Paul's words, can be misinterpreted). Here's the code I originally used to build the data. It has the CLUSTERED index on the "Dt" column, but not the PK constraint/index.

    /******************************************************************************

    This section of code builds a shedload of test data and is NOT a part of the

    solution. We're just building test data here.

    ******************************************************************************/

    --===== Declare some obviously name variables to control how much data we want

    -- and across which years

    DECLARE @StartYear DATETIME,

    @EndYear DATETIME,

    @NextYear DATETIME,

    @Rows INT

    ;

    SELECT @StartYear = '2000',

    @EndYear = '2011',

    @Rows = 1000000,

    @NextYear = DATEADD(yy,1,@EndYear)

    ;

    --=============================================================================

    -- Create the test data. This is NOT a part of the solution.

    -- Usually takes something less than 12 seconds to generate.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#Balances','U') IS NOT NULL

    DROP TABLE #Balances

    ;

    --===== Create the table and populate it on the fly.

    -- This builds a table with random dates and amounts from @StartYear

    -- through @EndYear

    SELECT TOP (@Rows)

    TransactionID = IDENTITY(INT,1,1),

    Dt = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartYear,@NextYear) + CAST(@StartYear AS DATETIME),

    Balance = CAST(RAND(CHECKSUM(NEWID()))*100 AS MONEY)

    INTO #Balances

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    --===== Add a clustered index to the table just to keep it from being a heap

    -- and to help with speed a bit by sorting the data.

    CREATE CLUSTERED INDEX IX_Balances_Dt

    ON #Balances (Dt)

    ;

    Then, if we run the first part of the solution code to find the correct dates...

    PRINT '========== Moden Code 01 ================================================'

    SET STATISTICS TIME,IO ON;

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

    DROP TABLE #WorkTable

    ;

    --===== Supress the auto-display of rowcounts

    SET NOCOUNT ON

    ;

    --===== Solve the first part of the problem... get the correct dates

    -- and the Balances for those dates. We stuff all of this into

    -- a temp table (144 rows) because we want to do a self join later.

    -- If we do that on a CTE, the CTE will run twice.

    SELECT RowNum = IDENTITY(INT,1,1),

    StartDate = dates.StartDate,

    Balance = bal.Balance

    INTO #WorkTable

    FROM (

    SELECT MIN(Dt)

    FROM #Balances

    -- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    UNION ALL

    SELECT MAX(Dt)

    FROM #Balances

    -- WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    GROUP BY DATEDIFF(mm,0,Dt)

    ) dates (StartDate)

    INNER JOIN #Balances bal

    ON dates.StartDate = bal.Dt

    ORDER BY dates.StartDate

    ;

    .. we get the following execution plan (just the important part shown):

    Notice the CLUSTERED INDEX SCAN on the #Balances table in the graphic above. On a million row table, that will normally concern a fair number of people (including me, initially) and that will drive them (including me, initially :blush:) to trying something more sophisticated (which isn't actually a bad thing... gotta know which way is best).

    We CAN get the CLUSTERED INDEX SCAN to change to a CLUSTERED INDEX SEEK on the execution plan by simply uncommenting the "WHERE Dt > 0" lines of code...

    PRINT '========== Moden Code 01 ================================================'

    SET STATISTICS TIME,IO ON;

    --===== Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#WorkTable','U') IS NOT NULL

    DROP TABLE #WorkTable

    ;

    --===== Supress the auto-display of rowcounts

    SET NOCOUNT ON

    ;

    --===== Solve the first part of the problem... get the correct dates

    -- and the Balances for those dates. We stuff all of this into

    -- a temp table (144 rows) because we want to do a self join later.

    -- If we do that on a CTE, the CTE will run twice.

    SELECT RowNum = IDENTITY(INT,1,1),

    StartDate = dates.StartDate,

    Balance = bal.Balance

    INTO #WorkTable

    FROM (

    SELECT MIN(Dt)

    FROM #Balances

    WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    UNION ALL

    SELECT MAX(Dt)

    FROM #Balances

    WHERE Dt > 0 --Will turn the scans into seeks but makes no difference. Execution plan lies.

    GROUP BY DATEDIFF(mm,0,Dt)

    ) dates (StartDate)

    INNER JOIN #Balances bal

    ON dates.StartDate = bal.Dt

    ORDER BY dates.StartDate

    ;

    That will give us a CLUSTERED INDEX SEEK instead of the scan we got before.

    But, is it really a SEEK? Yeah... according to SQL Server, it did a SEEK. But only to find the "first" row with a date greater than "0" (1900-01-01) and then it reverts to a SCAN behind the scenes. It still has to read all 1 million rows to do the job. If you look at the properties of the SEEK, you can see it uses RANGE SCAN instead of PREFIX like it does on the other INDEX SEEK for the #Balances table.

    That's what I meant by the "execution plan lies" and as Paul said... it's not normally that the execution plan that lies, it's how some interpret it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JonFox (11/6/2011)


    That's great stuff Jeff! I love this forum, I keep learning better ways to do things. 🙂

    Thanks Jon, but I have to really stop typing at 2 in the morning... I typed what I wanted to say exactly backwards. :blush: To make matters worse, I added the PK as a "final thought" and that made it impossible for anyone to follow what I was saying even if I had managed to type it correctly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply