OVER clause with Aggregate function - Wierd

  • Hi All,

    Can somebody please explain me why aren't we getting proper MAX(DOJ) value in Weird_Maxgroup Column in the below query ? Similar thing happens with MIN() function as well. :crazy:

    Is there something that I am missing here ?

    ;WITH CTE AS

    (

    SELECT 1 AS ID, 10000 AS SALARY,GETDATE() AS DOJ

    UNION ALL

    SELECT 1, 13000,GETDATE() - 10

    UNION ALL

    SELECT 3, 190000,GETDATE() + 10

    UNION ALL

    SELECT 2, 140000,GETDATE() - 87

    UNION ALL

    SELECT 2, 120000,GETDATE() - 12

    UNION ALL

    SELECT 3, 110000,GETDATE() - 56

    UNION ALL

    SELECT 1, 10000,GETDATE() + 61

    )

    SELECT

    ID

    ,SALARY

    ,DOJ

    ,MAX(DOJ) OVER (PARTITION BY ID) AS MAXGROUP

    ,MAX(DOJ) OVER (PARTITION BY ID ORDER BY DOJ) AS Weird_MaxGroup

    ,MAX(DOJ) OVER (PARTITION BY ID ORDER BY DOJ DESC) AS MAXGROUP2

    ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOJ DESC) AS NUM

    FROM CTE

  • This works, though my understanding of exactly what the ORDER BY 'does' in an OVER clause is a bit hazy:

    with CTE

    as (select 1 as ID

    ,10000 as SALARY

    ,getdate() as DOJ

    union all

    select 1

    ,13000

    ,getdate() - 10

    union all

    select 3

    ,190000

    ,getdate() + 10

    union all

    select 2

    ,140000

    ,getdate() - 87

    union all

    select 2

    ,120000

    ,getdate() - 12

    union all

    select 3

    ,110000

    ,getdate() - 56

    union all

    select 1

    ,10000

    ,getdate() + 61

    )

    select ID

    ,SALARY

    ,DOJ

    ,max(DOJ) over (partition by ID) as MAXGROUP

    ,max(DOJ) over (partition by ID order by DOJ RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as Weird_MaxGroup

    ,max(DOJ) over (partition by ID order by DOJ desc) as MAXGROUP2

    ,row_number() over (partition by ID order by DOJ desc) as NUM

    from CTE;

    I also fixed your weird spelling of 'weird' :w00t:

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • It works, Thank you Phil. But still it makes me wonder why did ordering DOJ in ascending order changed the output of the column there ? Can you please elaborate on that ?

    Spelling corrected:-D

  • SQLCJ (7/2/2015)


    It works, Thank you Phil. But still it makes me wonder why did ordering DOJ in ascending order changed the output of the column there ? Can you please elaborate on that ?

    Spelling corrected:-D

    Not much! I found this in BOL:

    If ORDER BY is not specified entire partition is used for a window frame.

    Which implies that if ORDER BY is specified, the entire partition is not used.

    This, I suppose, is happening here.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • May be.. Anyway Thank you once again.

  • Run this tweak to observe a less irregular usage πŸ˜‰

    ;WITH CTE AS

    (

    SELECT 1 AS ID, 10000 AS SALARY,GETDATE() AS DOJ

    UNION ALL

    SELECT 1, 13000,GETDATE() - 10

    UNION ALL

    SELECT 3, 190000,GETDATE() + 10

    UNION ALL

    SELECT 2, 140000,GETDATE() - 87

    UNION ALL

    SELECT 2, 120000,GETDATE() - 12

    UNION ALL

    SELECT 3, 110000,GETDATE() - 56

    UNION ALL

    SELECT 1, 10000,GETDATE() + 61

    )

    SELECT

    ID

    ,SALARY

    ,DOJ

    ,MAX(DOJ) OVER (PARTITION BY ID) AS MAXGROUP

    ,MAX(DOJ) OVER (PARTITION BY ID ORDER BY DOJ) AS Weird_MaxGroup

    ,MAX(DOJ) OVER (PARTITION BY ID ORDER BY DOJ DESC) AS MAXGROUP2

    ,SUM(YEAR(DOJ)) OVER (PARTITION BY ID ORDER BY DOJ) AS SUMGroup

    ,SUM(YEAR(DOJ)) OVER (PARTITION BY ID ORDER BY DOJ DESC) AS SUMGroupDESC

    ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOJ DESC) AS NUM

    FROM CTE

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If you don't specify the ROWS or RANGE, it uses RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the current row is always the max value in that range with that sort. Try specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (7/2/2015)


    If you don't specify the ROWS or RANGE, it uses RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the current row is always the max value in that range with that sort. Try specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    Drew

    Pretty much what I suggested. I did not find these 'defaults' in the documentation though, thanks for posting.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (7/2/2015)


    drew.allen (7/2/2015)


    If you don't specify the ROWS or RANGE, it uses RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the current row is always the max value in that range with that sort. Try specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    Drew

    Pretty much what I suggested. I did not find these 'defaults' in the documentation though, thanks for posting.

    Yes, I didn't look at your code before I had posted this, and, from the comments, it didn't appear that a definitive answer had been supplied.

    It's under the "General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (7/2/2015)


    If you don't specify the ROWS or RANGE, it uses RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the current row is always the max value in that range with that sort. Try specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    Drew

    I know I've seen that mentioned in BOL somewhere, but I can't recall where, exactly. Effectively, providing an OVER clause for MIN or MAX functions turns them into what I'll call MIN_SO_FAR or MAX_SO_FAR within the specified partition, and based on the specified order, and SUM or AVG into RUNNING_TOTAL and AVG_SO_FAR, again based on the partitions specified and the order. ROWS or RANGE specifications are just modifications to the size of the partition (or window), which can be different for each and every record in the resultset.

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • drew.allen (7/2/2015)


    Phil Parkin (7/2/2015)


    drew.allen (7/2/2015)


    If you don't specify the ROWS or RANGE, it uses RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the current row is always the max value in that range with that sort. Try specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    Drew

    Pretty much what I suggested. I did not find these 'defaults' in the documentation though, thanks for posting.

    Yes, I didn't look at your code before I had posted this, and, from the comments, it didn't appear that a definitive answer had been supplied.

    It's under the "General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx

    Drew

    Thanks. I looked exactly there this morning, but it was pre-coffee and clearly didn't sink in!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply