Windowed Function Where Clause

Question

Windowed Function Where Clause

Simon Davidson

Old Hand

Points: 322
More actions
February 25, 2011 at 4:39 am

#388935

Hi,
My apologies if this isn't the correct forum to post to...
I'm trying to write a query which fetches the latest budgets from a table, which I have solved, but not sure if its the best way to go about it. To give you some background, the table has 2 key columns which identify a project (ACCOUNT_CODE and SECONDARY_CODE), 2 date columns which identify the year and period (YEAR and PERIOD), a budget revision number which gets incremented every time a budget is revised (REVISION_NUMBER) and the actual value (VALUE).
In my opinion the best way of getting latest budget for each period would be to use the MAX windowed function in the where clause, but this isn't allowed.
SELECTb.ACCOUNT_CODE,
b.SECONDARY_CODE,
SUM(b.VALUE) BUDGET
FROMBUDGETS b
WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND
b.YEAR = 0 /* Current Year */ AND
b.REVISION_NUMBER = MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD)
GROUP BY b.ACCOUNT_CODE,
b.SECONDARY_CODE
This throws the error "Windowed functions can only appear in the SELECT or ORDER BY clauses".
So instead I wrote a sub-query
SELECTa.ACCOUNT_CODE,
a.SECONDARY_CODE,
SUM(a.VALUE) BUDGET
FROM (
SELECTb.ACCOUNT_CODE,
b.SECONDARY_CODE,
b.PERIOD,
b.VALUE,
b.REVISION_NUMBER,
MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER
FROMBUDGETS b WITH (NOLOCK)
WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND
b.YEAR = 0 /* Current Year */
) a
WHEREa.REVISION_NUMBER = a.MAX_REVISION_NUMBER
GROUP BY a.ACCOUNT_CODE,
a.SECONDARY_CODE
This gets the desired result, however not sure if anyone would do it a different / better way?
Thanks

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 1

I would like to take a shot at this one...can you please provide the DDL for your tables and some DML to build some test data?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Simon Davidson Old Hand Points: 322 More actions · Answer 2

Sure, here you go:

CREATE TABLE [BUDGETS](

[ACCOUNT_CODE] [varchar](25) NULL,

[SECONDARY_CODE] [varchar](25) NULL,

[RECORD_TYPE] [varchar](2) NULL,

[PERIOD] [tinyint] NULL,

[YEAR] [int] NULL,

[VALUE] [float] NULL,

[REVISION_NUMBER] [smallint] NULL)

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1000.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,2000.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',3,0,1500.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',4,0,1750.00,0)

GO

INSERT INTOBUDGETS VALUES ('AB12345','98765','C',1,0,250.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','98765','C',2,0,495.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','98765','C',3,0,539.00,0)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1299.00,1)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,1750.00,1)

GO

INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,999.00,2)

GO

INSERT INTO BUDGETS VALUES ('XY67890','33333','C',1,0,123.00,0)

GO

INSERT INTO BUDGETS VALUES ('XY67890','33333','C',2,0,439.00,0)

GO

INSERT INTO BUDGETS VALUEs ('XY67890','33333','C',2,0,749.00,1)

GO

The period represents the financial month, the record type is a single character switch but in my case I'm only interested in 'C' records. The year is also a switch, agian, I'm only interested in '0' records.

The result set should look like:

ACCOUNT_CODE | SECONDARY_CODE | BUDGET

----------------------------------------------

AB12345 | 12345 | 5999

AB12345 | 98765 | 1284

XY67890 | 33333 | 872

WayneS SSC Guru Points: 95460 More actions · Answer 3

;

WITH CTE AS

(

SELECT *,

RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)

FROM #Budgets

)

SELECT Account_Code, Secondary_Code, SUM(VALUE)

FROM CTE

WHERE RN = 1

GROUP BY Account_Code, Secondary_Code

You might want to check out this article on SQL Server Ranking Functions[/url]

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Simon Davidson Old Hand Points: 322 More actions · Answer 4

Hi WayneS,

Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?

Thanks

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 5

That's pretty cool Wayne...that's why I wanted "in" on this post. Running an estimated execution plan with the sub-query method using MAX()/OVER PARTITION BY and the CTE (common table expression) using the ROW_NUMBER()/OVER PARTITION BY method result in a essentially a split cost (51% / 49% respectfully) however when I run the queries with IO stats on I see something interesting with respect to the # of scans. It appears that the ROW_NUMBER()/OVER PARTITION BY method is a little cheaper.

Simon, could you try this on the actual dataset and let us know if it performs any better in the large?

SET STATISTICS IO OFF

GO

-- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!

CHECKPOINT

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

GO

SET STATISTICS IO ON

GO

SELECT a.ACCOUNT_CODE,

a.SECONDARY_CODE,

SUM(a.VALUE) BUDGET

FROM (SELECT b.ACCOUNT_CODE,

b.SECONDARY_CODE,

b.PERIOD,

b.VALUE,

b.REVISION_NUMBER,

MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER

FROM BUDGETS b WITH (NOLOCK)

WHERE b.RECORD_TYPE = 'C' /* Costing Budget */

AND b.YEAR = 0 /* Current Year */

) a

WHERE a.REVISION_NUMBER = a.MAX_REVISION_NUMBER

GROUP BY a.ACCOUNT_CODE,

a.SECONDARY_CODE ;

go

SET STATISTICS IO OFF

GO

-- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!

CHECKPOINT

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

GO

SET STATISTICS IO ON

GO

WITH CTE

AS (SELECT *,

RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)

FROM Budgets

)

SELECT Account_Code,

Secondary_Code,

SUM(VALUE)

FROM CTE

WHERE RN = 1

GROUP BY Account_Code,

Secondary_Code ;

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(3 row(s) affected)
Table 'Worktable'. Scan count 3, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(3 row(s) affected)
Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

WayneS SSC Guru Points: 95460 More actions · Answer 6

Simon Davidson (2/25/2011)
Hi WayneS,
Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?
Thanks

Simon,

Yes, for most cases (excluding a recursive CTE), you can think of a CTE as a "pre-defined sub-query".

opc.three posted a test to run to show which would be more efficient. If you think about the two different methods, I would think that the MAX() would be a little bit slower, since it has to deal with being able to handle different data types (the string '14' is > '123'; but the number 123 is > 14, etc.) Running a performance test against data similar to what you will be running against is the only way to tell for sure.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Paul White SSC Guru Points: 150467 More actions · Answer 7

Simon Davidson (2/25/2011)
If so, does this have any performance benefits over the sub-query I wrote?

Yes. The MAX() OVER(...) construction has to spool results because the group maximum is applied to every row in the group. The spool is called a Common SubExpression Spool. I wrote about them here:

http://sqlblog.com/blogs/paul_white/archive/2010/07/28/partitioning-and-the-common-subexpression-spool.aspx

As far as the CTE is concerned, in this case, Wayne could have written his code like this:

SELECT Account_Code,

Secondary_Code,

SUM(VALUE)

FROM (

SELECT *,

RN = ROW_NUMBER() OVER

(

PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD]

ORDER BY [REVISION_NUMBER] DESC

)

FROM Budgets

) AS CTE

WHERE RN = 1

GROUP BY

Account_Code,

Secondary_Code

;

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Simon Davidson Old Hand Points: 322 More actions · Answer 8

Hi guys,

Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.

Thanks

Sent from my HTC

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 9

Simon Davidson (2/27/2011)
Hi guys,
Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.
Thanks

Don't wait... build a million rows of test data now. It's not that hard and it doesn't take that long.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)