February 25, 2011 at 4:39 am
Hi,
My apologies if this isn't the correct forum to post to...
I'm trying to write a query which fetches the latest budgets from a table, which I have solved, but not sure if its the best way to go about it. To give you some background, the table has 2 key columns which identify a project (ACCOUNT_CODE and SECONDARY_CODE), 2 date columns which identify the year and period (YEAR and PERIOD), a budget revision number which gets incremented every time a budget is revised (REVISION_NUMBER) and the actual value (VALUE).
In my opinion the best way of getting latest budget for each period would be to use the MAX windowed function in the where clause, but this isn't allowed.
SELECTb.ACCOUNT_CODE,
b.SECONDARY_CODE,
SUM(b.VALUE) BUDGET
FROMBUDGETS b
WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND
b.YEAR = 0 /* Current Year */ AND
b.REVISION_NUMBER = MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD)
GROUP BY b.ACCOUNT_CODE,
b.SECONDARY_CODE
This throws the error "Windowed functions can only appear in the SELECT or ORDER BY clauses".
So instead I wrote a sub-query
SELECTa.ACCOUNT_CODE,
a.SECONDARY_CODE,
SUM(a.VALUE) BUDGET
FROM (
SELECTb.ACCOUNT_CODE,
b.SECONDARY_CODE,
b.PERIOD,
b.VALUE,
b.REVISION_NUMBER,
MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER
FROMBUDGETS b WITH (NOLOCK)
WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND
b.YEAR = 0 /* Current Year */
) a
WHEREa.REVISION_NUMBER = a.MAX_REVISION_NUMBER
GROUP BY a.ACCOUNT_CODE,
a.SECONDARY_CODE
This gets the desired result, however not sure if anyone would do it a different / better way?
Thanks
February 25, 2011 at 8:51 am
I would like to take a shot at this one...can you please provide the DDL for your tables and some DML to build some test data?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 25, 2011 at 9:54 am
Sure, here you go:
CREATE TABLE [BUDGETS](
[ACCOUNT_CODE] [varchar](25) NULL,
[SECONDARY_CODE] [varchar](25) NULL,
[RECORD_TYPE] [varchar](2) NULL,
[PERIOD] [tinyint] NULL,
[YEAR] [int] NULL,
[VALUE] [float] NULL,
[REVISION_NUMBER] [smallint] NULL)
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1000.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,2000.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',3,0,1500.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',4,0,1750.00,0)
GO
INSERT INTOBUDGETS VALUES ('AB12345','98765','C',1,0,250.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','98765','C',2,0,495.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','98765','C',3,0,539.00,0)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1299.00,1)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,1750.00,1)
GO
INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,999.00,2)
GO
INSERT INTO BUDGETS VALUES ('XY67890','33333','C',1,0,123.00,0)
GO
INSERT INTO BUDGETS VALUES ('XY67890','33333','C',2,0,439.00,0)
GO
INSERT INTO BUDGETS VALUEs ('XY67890','33333','C',2,0,749.00,1)
GO
The period represents the financial month, the record type is a single character switch but in my case I'm only interested in 'C' records. The year is also a switch, agian, I'm only interested in '0' records.
The result set should look like:
ACCOUNT_CODE | SECONDARY_CODE | BUDGET
----------------------------------------------
AB12345 | 12345 | 5999
AB12345 | 98765 | 1284
XY67890 | 33333 | 872
February 25, 2011 at 10:33 am
;
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)
FROM #Budgets
)
SELECT Account_Code, Secondary_Code, SUM(VALUE)
FROM CTE
WHERE RN = 1
GROUP BY Account_Code, Secondary_Code
You might want to check out this article on SQL Server Ranking Functions[/url]
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
February 25, 2011 at 10:56 am
Hi WayneS,
Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?
Thanks
February 25, 2011 at 11:23 am
That's pretty cool Wayne...that's why I wanted "in" on this post. Running an estimated execution plan with the sub-query method using MAX()/OVER PARTITION BY and the CTE (common table expression) using the ROW_NUMBER()/OVER PARTITION BY method result in a essentially a split cost (51% / 49% respectfully) however when I run the queries with IO stats on I see something interesting with respect to the # of scans. It appears that the ROW_NUMBER()/OVER PARTITION BY method is a little cheaper.
Simon, could you try this on the actual dataset and let us know if it performs any better in the large?
SET STATISTICS IO OFF
GO
-- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!
CHECKPOINT
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SELECT a.ACCOUNT_CODE,
a.SECONDARY_CODE,
SUM(a.VALUE) BUDGET
FROM (SELECT b.ACCOUNT_CODE,
b.SECONDARY_CODE,
b.PERIOD,
b.VALUE,
b.REVISION_NUMBER,
MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER
FROM BUDGETS b WITH (NOLOCK)
WHERE b.RECORD_TYPE = 'C' /* Costing Budget */
AND b.YEAR = 0 /* Current Year */
) a
WHERE a.REVISION_NUMBER = a.MAX_REVISION_NUMBER
GROUP BY a.ACCOUNT_CODE,
a.SECONDARY_CODE ;
go
SET STATISTICS IO OFF
GO
-- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!
CHECKPOINT
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
WITH CTE
AS (SELECT *,
RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)
FROM Budgets
)
SELECT Account_Code,
Secondary_Code,
SUM(VALUE)
FROM CTE
WHERE RN = 1
GROUP BY Account_Code,
Secondary_Code ;
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(3 row(s) affected)
Table 'Worktable'. Scan count 3, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(3 row(s) affected)
Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 25, 2011 at 11:54 am
Simon Davidson (2/25/2011)
Hi WayneS,Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?
Thanks
Simon,
Yes, for most cases (excluding a recursive CTE), you can think of a CTE as a "pre-defined sub-query".
opc.three posted a test to run to show which would be more efficient. If you think about the two different methods, I would think that the MAX() would be a little bit slower, since it has to deal with being able to handle different data types (the string '14' is > '123'; but the number 123 is > 14, etc.) Running a performance test against data similar to what you will be running against is the only way to tell for sure.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
February 27, 2011 at 12:50 am
Simon Davidson (2/25/2011)
If so, does this have any performance benefits over the sub-query I wrote?
Yes. The MAX() OVER(...) construction has to spool results because the group maximum is applied to every row in the group. The spool is called a Common SubExpression Spool. I wrote about them here:
As far as the CTE is concerned, in this case, Wayne could have written his code like this:
SELECT Account_Code,
Secondary_Code,
SUM(VALUE)
FROM (
SELECT *,
RN = ROW_NUMBER() OVER
(
PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD]
ORDER BY [REVISION_NUMBER] DESC
)
FROM Budgets
) AS CTE
WHERE RN = 1
GROUP BY
Account_Code,
Secondary_Code
;
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 27, 2011 at 3:00 am
Hi guys,
Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.
Thanks
Sent from my HTC
February 27, 2011 at 10:16 am
Simon Davidson (2/27/2011)
Hi guys,Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.
Thanks
Don't wait... build a million rows of test data now. It's not that hard and it doesn't take that long.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply