Getting first and last value for every group.

  • Luis,

    Based on the sample data, it makes no sense to order by [BaseYear] inside a partition because the combination does not identify uniquely a row so the enumeration will be non-deterministic.

    Here is the output I get using your suggestion:

    userIdStartValueEndValueRecords

    125000252005

    225050262004

    353504535046

    Notice the value of [EndValueRecords] for userId equal 3. That is not the expected value.

  • I noted the same thing when I was testing the code and mention it in my original post. I thought that the sample data might be wrong and posted the code anyway including the notes (which were quoted by David).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • If there was a mistake in the posting for the sample data and the combination of (userId, BaseYear) identifies uniquely each row then your solution is viable.

    We could also use the functions FIRST_VALUE / LASTVALUE but to get a unique row per partition (final result) we will need to add the DISTINCT keyword because we do not have ordered set functions yet in T-SQL.

  • Eirikur Eiriksson (5/30/2014)


    Since this is a 2012 thread, how about the window functions?

    😎

    create table #sampledata

    (

    userId int,

    BaseYear int,

    TotalSales float

    );

    insert into #sampledata

    select 1,2008,25000 union

    select 1,2009,22500 union

    select 1,2010,23400 union

    select 1,2011,26700 union

    select 1,2012,25200 union

    select 2,2008,25050 union

    select 2,2009,22510 union

    select 2,2010,23470 union

    select 2,2011,26200 union

    select 3,2008,53504 union

    select 3,2008,43504 union

    select 3,2008,25504 union

    select 3,2008,23404 union

    select 3,2008,25504 union

    select 3,2008,23804 ; -- << Pet peeve alert! Semicolons are statement terminators not statement begininators.

    WITH S_LIST AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY SD.userId

    ORDER BY (SELECT NULL)

    ) AS RID

    ,SD.userId

    ,FIRST_VALUE(SD.BaseYear) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.TotalSales ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS FIRST_BaseYear

    ,LAST_VALUE(SD.BaseYear) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.TotalSales ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS LAST_BaseYear

    ,FIRST_VALUE(SD.TotalSales) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.BaseYear ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS FIRST_TotalSales

    ,LAST_VALUE(SD.TotalSales) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.BaseYear ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS LAST_TotalSales

    FROM #sampledata SD

    )

    SELECT

    *

    FROM S_LIST SL

    WHERE SL.RID = 1;

    drop table #sampledata;

    Results

    RID userId FIRST_BaseYear LAST_BaseYear FIRST_TotalSales LAST_TotalSales

    ---- ------- -------------- ------------- ----------------- ----------------

    1 1 2009 2011 25000 25200

    1 2 2009 2011 25050 26200

    1 3 2008 2008 23404 53504

    < rant >

    Please note the pet peeve alert above.

    **** Also, not picking on Eirikur as David's code has the same issue. ****

    < /rant >

  • Lynn Pettis (6/3/2014)


    Eirikur Eiriksson (5/30/2014)


    Since this is a 2012 thread, how about the window functions?

    😎

    create table #sampledata

    (

    userId int,

    BaseYear int,

    TotalSales float

    );

    insert into #sampledata

    select 1,2008,25000 union

    select 1,2009,22500 union

    select 1,2010,23400 union

    select 1,2011,26700 union

    select 1,2012,25200 union

    select 2,2008,25050 union

    select 2,2009,22510 union

    select 2,2010,23470 union

    select 2,2011,26200 union

    select 3,2008,53504 union

    select 3,2008,43504 union

    select 3,2008,25504 union

    select 3,2008,23404 union

    select 3,2008,25504 union

    select 3,2008,23804 ; -- << Pet peeve alert! Semicolons are statement terminators not statement begininators.

    WITH S_LIST AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY SD.userId

    ORDER BY (SELECT NULL)

    ) AS RID

    ,SD.userId

    ,FIRST_VALUE(SD.BaseYear) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.TotalSales ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS FIRST_BaseYear

    ,LAST_VALUE(SD.BaseYear) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.TotalSales ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS LAST_BaseYear

    ,FIRST_VALUE(SD.TotalSales) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.BaseYear ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS FIRST_TotalSales

    ,LAST_VALUE(SD.TotalSales) OVER

    (

    PARTITION BY SD.userId

    ORDER BY SD.BaseYear ASC

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND UNBOUNDED FOLLOWING

    ) AS LAST_TotalSales

    FROM #sampledata SD

    )

    SELECT

    *

    FROM S_LIST SL

    WHERE SL.RID = 1;

    drop table #sampledata;

    Results

    RID userId FIRST_BaseYear LAST_BaseYear FIRST_TotalSales LAST_TotalSales

    ---- ------- -------------- ------------- ----------------- ----------------

    1 1 2009 2011 25000 25200

    1 2 2009 2011 25050 26200

    1 3 2008 2008 23404 53504

    < rant >

    Please note the pet peeve alert above.

    **** Also, not picking on Eirikur as David's code has the same issue. ****

    < /rant >

    Call it carefully guarded vigilant watchful belt-and-braces approach:hehe:

    😎

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply