Windowed Function Where Clause

  • Hi,

    My apologies if this isn't the correct forum to post to...

    I'm trying to write a query which fetches the latest budgets from a table, which I have solved, but not sure if its the best way to go about it. To give you some background, the table has 2 key columns which identify a project (ACCOUNT_CODE and SECONDARY_CODE), 2 date columns which identify the year and period (YEAR and PERIOD), a budget revision number which gets incremented every time a budget is revised (REVISION_NUMBER) and the actual value (VALUE).

    In my opinion the best way of getting latest budget for each period would be to use the MAX windowed function in the where clause, but this isn't allowed.

    SELECTb.ACCOUNT_CODE,

    b.SECONDARY_CODE,

    SUM(b.VALUE) BUDGET

    FROMBUDGETS b

    WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND

    b.YEAR = 0 /* Current Year */ AND

    b.REVISION_NUMBER = MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD)

    GROUP BY b.ACCOUNT_CODE,

    b.SECONDARY_CODE

    This throws the error "Windowed functions can only appear in the SELECT or ORDER BY clauses".

    So instead I wrote a sub-query

    SELECTa.ACCOUNT_CODE,

    a.SECONDARY_CODE,

    SUM(a.VALUE) BUDGET

    FROM (

    SELECTb.ACCOUNT_CODE,

    b.SECONDARY_CODE,

    b.PERIOD,

    b.VALUE,

    b.REVISION_NUMBER,

    MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER

    FROMBUDGETS b WITH (NOLOCK)

    WHEREb.RECORD_TYPE = 'C' /* Costing Budget */ AND

    b.YEAR = 0 /* Current Year */

    ) a

    WHEREa.REVISION_NUMBER = a.MAX_REVISION_NUMBER

    GROUP BY a.ACCOUNT_CODE,

    a.SECONDARY_CODE

    This gets the desired result, however not sure if anyone would do it a different / better way?

    Thanks

  • I would like to take a shot at this one...can you please provide the DDL for your tables and some DML to build some test data?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Sure, here you go:

    CREATE TABLE [BUDGETS](

    [ACCOUNT_CODE] [varchar](25) NULL,

    [SECONDARY_CODE] [varchar](25) NULL,

    [RECORD_TYPE] [varchar](2) NULL,

    [PERIOD] [tinyint] NULL,

    [YEAR] [int] NULL,

    [VALUE] [float] NULL,

    [REVISION_NUMBER] [smallint] NULL)

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1000.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,2000.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',3,0,1500.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',4,0,1750.00,0)

    GO

    INSERT INTOBUDGETS VALUES ('AB12345','98765','C',1,0,250.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','98765','C',2,0,495.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','98765','C',3,0,539.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,1299.00,1)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',2,0,1750.00,1)

    GO

    INSERT INTO BUDGETS VALUES ('AB12345','12345','C',1,0,999.00,2)

    GO

    INSERT INTO BUDGETS VALUES ('XY67890','33333','C',1,0,123.00,0)

    GO

    INSERT INTO BUDGETS VALUES ('XY67890','33333','C',2,0,439.00,0)

    GO

    INSERT INTO BUDGETS VALUEs ('XY67890','33333','C',2,0,749.00,1)

    GO

    The period represents the financial month, the record type is a single character switch but in my case I'm only interested in 'C' records. The year is also a switch, agian, I'm only interested in '0' records.

    The result set should look like:

    ACCOUNT_CODE | SECONDARY_CODE | BUDGET

    ----------------------------------------------

    AB12345 | 12345 | 5999

    AB12345 | 98765 | 1284

    XY67890 | 33333 | 872

  • ;

    WITH CTE AS

    (

    SELECT *,

    RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)

    FROM #Budgets

    )

    SELECT Account_Code, Secondary_Code, SUM(VALUE)

    FROM CTE

    WHERE RN = 1

    GROUP BY Account_Code, Secondary_Code

    You might want to check out this article on SQL Server Ranking Functions[/url]

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi WayneS,

    Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?

    Thanks

  • That's pretty cool Wayne...that's why I wanted "in" on this post. Running an estimated execution plan with the sub-query method using MAX()/OVER PARTITION BY and the CTE (common table expression) using the ROW_NUMBER()/OVER PARTITION BY method result in a essentially a split cost (51% / 49% respectfully) however when I run the queries with IO stats on I see something interesting with respect to the # of scans. It appears that the ROW_NUMBER()/OVER PARTITION BY method is a little cheaper.

    Simon, could you try this on the actual dataset and let us know if it performs any better in the large?

    SET STATISTICS IO OFF

    GO

    -- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!

    CHECKPOINT

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    GO

    SET STATISTICS IO ON

    GO

    SELECT a.ACCOUNT_CODE,

    a.SECONDARY_CODE,

    SUM(a.VALUE) BUDGET

    FROM (SELECT b.ACCOUNT_CODE,

    b.SECONDARY_CODE,

    b.PERIOD,

    b.VALUE,

    b.REVISION_NUMBER,

    MAX(b.REVISION_NUMBER) OVER (PARTITION BY b.ACCOUNT_CODE, b.SECONDARY_CODE, b.PERIOD) MAX_REVISION_NUMBER

    FROM BUDGETS b WITH (NOLOCK)

    WHERE b.RECORD_TYPE = 'C' /* Costing Budget */

    AND b.YEAR = 0 /* Current Year */

    ) a

    WHERE a.REVISION_NUMBER = a.MAX_REVISION_NUMBER

    GROUP BY a.ACCOUNT_CODE,

    a.SECONDARY_CODE ;

    go

    SET STATISTICS IO OFF

    GO

    -- DO NOT RUN THESE ON A PRODUCTION SYSTEM !!!!

    CHECKPOINT

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    GO

    SET STATISTICS IO ON

    GO

    WITH CTE

    AS (SELECT *,

    RN = ROW_NUMBER() OVER (PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD] ORDER BY [REVISION_NUMBER] DESC)

    FROM Budgets

    )

    SELECT Account_Code,

    Secondary_Code,

    SUM(VALUE)

    FROM CTE

    WHERE RN = 1

    GROUP BY Account_Code,

    Secondary_Code ;

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    (3 row(s) affected)

    Table 'Worktable'. Scan count 3, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    (3 row(s) affected)

    Table 'BUDGETS'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Simon Davidson (2/25/2011)


    Hi WayneS,

    Thanks for your reply. I did try Row_Number but got the same message about window functions. What I didn't know was the WITH syntax, essentially is this just creating the sub-query before hand and passing it to the second SELECT statement? If so, does this have any performance benefits over the sub-query I wrote?

    Thanks

    Simon,

    Yes, for most cases (excluding a recursive CTE), you can think of a CTE as a "pre-defined sub-query".

    opc.three posted a test to run to show which would be more efficient. If you think about the two different methods, I would think that the MAX() would be a little bit slower, since it has to deal with being able to handle different data types (the string '14' is > '123'; but the number 123 is > 14, etc.) Running a performance test against data similar to what you will be running against is the only way to tell for sure.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Simon Davidson (2/25/2011)


    If so, does this have any performance benefits over the sub-query I wrote?

    Yes. The MAX() OVER(...) construction has to spool results because the group maximum is applied to every row in the group. The spool is called a Common SubExpression Spool. I wrote about them here:

    http://sqlblog.com/blogs/paul_white/archive/2010/07/28/partitioning-and-the-common-subexpression-spool.aspx

    As far as the CTE is concerned, in this case, Wayne could have written his code like this:

    SELECT Account_Code,

    Secondary_Code,

    SUM(VALUE)

    FROM (

    SELECT *,

    RN = ROW_NUMBER() OVER

    (

    PARTITION BY [ACCOUNT_CODE], [SECONDARY_CODE], [PERIOD]

    ORDER BY [REVISION_NUMBER] DESC

    )

    FROM Budgets

    ) AS CTE

    WHERE RN = 1

    GROUP BY

    Account_Code,

    Secondary_Code

    ;

    Paul

  • Hi guys,

    Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.

    Thanks

    Sent from my HTC

  • Simon Davidson (2/27/2011)


    Hi guys,

    Thanks for your responses. Unfortunately the query is for a system that won't be going Live till April. I'm expecting to get some historic data to popular the table some time in March, when I do I'll run those performance benchmarks and update you.

    Thanks

    Don't wait... build a million rows of test data now. It's not that hard and it doesn't take that long.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply