Aggregation with every N number of records grouped

  • I would like to generate an output which shows me the average value of Col2 for every 5 rows and the starting number of the 5 row set (i.e. Col1).

    For an example, the output of my query should be 3 rows as follows:

    StartingNumber | AvgValue

    1 | 9.2

    5 | 186.6

    10 | 284.25

    If there is no 5th row for the last set, then just average the ones that are left (in this case 4 records).

    I have a table with the following model and data:

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#Table_1','U') IS NOT NULL

    DROP TABLE #Table_1

    --===== Create the test table with

    CREATE TABLE #Table_1

    (

    [Col1] [int] NULL,

    [Col2] [int] NULL

    )

    --===== Insert the test data into the test table

    INSERT INTO #Table_1

    (Col1, Col2)

    SELECT '1','1' UNION ALL

    SELECT '2','5' UNION ALL

    SELECT '3','2' UNION ALL

    SELECT '4','6' UNION ALL

    SELECT '5','32' UNION ALL

    SELECT '6','56' UNION ALL

    SELECT '7','2' UNION ALL

    SELECT '8','798' UNION ALL

    SELECT '9','43' UNION ALL

    SELECT '10','34' UNION ALL

    SELECT '11','563' UNION ALL

    SELECT '12','23' UNION ALL

    SELECT '13','465' UNION ALL

    SELECT '14','86'

    Make everything as simple as possible, but not simpler.
    Albert Einstein

  • Maybe this?

    SELECT MIN(Col1) AS StartingNumber,

    AVG(Col2*1.0) AS AvgValue

    FROM dbo.Table_1

    GROUP BY (Col1-1) / 5

    ORDER BY MIN(Col1)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • My question was to get the aggregation for every 5 rows of data, no matter how many rows there are in the table. If I have 503 records in my table, I would respectively have 101 records with aggregated data (where the 101st is just an aggregation of the last 2).

    /I have to improve on my question posting technique, and I will. /

    I just edited the original post to give a better look, according to the posting standards.

    Thank you for your reply.

    Make everything as simple as possible, but not simpler.
    Albert Einstein

  • --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#Table_1','U') IS NOT NULL

    DROP TABLE #Table_1

    --===== Create the test table with

    CREATE TABLE #Table_1

    (

    [Col1] [int] IDENTITY(1,1),

    [Col2] [int] NULL

    )

    declare @i INT

    SELECT @i = 1

    WHILE(@i<2)

    BEGIN

    --===== Insert the test data into the test table

    INSERT INTO #Table_1

    (Col2)

    SELECT '1' UNION ALL

    SELECT '5' UNION ALL

    SELECT '2' UNION ALL

    SELECT '6' UNION ALL

    SELECT '32' UNION ALL

    SELECT '56' UNION ALL

    SELECT '2' UNION ALL

    SELECT '798' UNION ALL

    SELECT '43' UNION ALL

    SELECT '34' UNION ALL

    SELECT '563' UNION ALL

    SELECT '23' UNION ALL

    SELECT '465' UNION ALL

    SELECT '86'

    select @i = @i + 1

    END

    SELECT

    MIN(COL1) AS ID,

    CAST( SUM (Col2)/SUM(1.0) AS DECIMAL(18,2))

    FROM #Table_1

    GROUP BY

    (Col1-1) / 5

    Regards,
    Mitesh OSwal
    +918698619998

  • you can use NTILE function with argument equals to CEILING of count of records in the table divided by number specifying number of records in the group.

    WITH NTileTable AS (

    SELECT

    NTile(CONVERT(int, (SELECT CEILING(COUNT(1) / 5.0) FROM #Table_1))) OVER(ORDER BY Col1) NTileCol,

    Col1,

    Col2

    FROM #Table_1

    )

    SELECT

    NTileCol,

    Min(Col1) As StartCol,

    AVG(Col2) AS AVGCol2

    FROM NTileTable

    GROUP BY NTileCol

  • sibir1us (12/10/2009)


    My question was to get the aggregation for every 5 rows of data, no matter how many rows there are in the table. If I have 503 records in my table, I would respectively have 101 records with aggregated data (where the 101st is just an aggregation of the last 2).

    /I have to improve on my question posting technique, and I will. /

    I just edited the original post to give a better look, according to the posting standards.

    Thank you for your reply.

    This should show you how it's done, and give you a result:

    CREATE TABLE #Table_1

    (

    [Col1] [int] NULL,

    [Col2] [int] NULL

    )

    --===== Insert the test data into the test table

    INSERT INTO #Table_1

    (Col1, Col2)

    SELECT '1','1' UNION ALL

    SELECT '2','5' UNION ALL

    SELECT '3','2' UNION ALL

    SELECT '4','6' UNION ALL

    SELECT '5','32' UNION ALL

    SELECT '6','56' UNION ALL

    SELECT '7','2' UNION ALL

    SELECT '8','798' UNION ALL

    SELECT '9','43' UNION ALL

    SELECT '10','34' UNION ALL

    SELECT '11','563' UNION ALL

    SELECT '12','23' UNION ALL

    SELECT '13','465' UNION ALL

    SELECT '14','86'

    SELECT MIN(Col1), SUM(Col2), COUNT(*)

    FROM (

    SELECT *, CAST((Col1-1)/5 AS INT) AS [Rank]

    FROM #Table_1

    ) d

    GROUP BY [Rank]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply