aggregate by a given period

  • Hello!

    Please, consider the following code

    DECLARE @SNH TABLE
    (
    cntDT DATETIME,
    cntQ varchar(10),
    cntL int
    )
    INSERT INTO @SNH (cntDT, cntQ, cntL)
    VALUES ('2001-04-04 10:00:00', 'Queue01', 3),
    ('2001-04-05 10:03:00', 'Queue01', 1),
    ('2001-04-05 10:06:00', 'Queue01', 1),
    ('2001-04-05 10:07:00', 'Queue01', 1),
    ('2001-04-05 10:11:00', 'Queue01', 1),
    ('2001-04-05 10:12:00', 'Queue01', 10)

    I'd like to have the table @SNH grouped in a predefined period of time, let say 5 minutes

    To have the following table

    chrome_aohYkn9D4y

    My idea is to have @SNH inner joined with a base table @dtb

    | cntDTB |

    | 2001-04-04 10:00:00 |

    | 2001-04-04 10:05:00 |

    | 2001-04-04 10:10:00 |

    I guess, smth like the following code should help

    SELECT @DTB.cntDTB, @SNH.cntQ, sum(@SNH.cntQ)
    FROM @SNH as S, @DTB as B
    ON S.cntDTS BETWEEN B.cntDTB AND (B.cntDTB+1)

    but I have 2 issues

    Unfortunately, I don't know how to dynamically create @dtb and how to address the next @SNH row

    Would someone please advise on how to proceed?

    • This topic was modified 2 years, 5 months ago by  nkat.
  • It sounds like you just need to use a numbers table or tally function to generate numbers, & use that to generate table of datetimes using DATEADD.

    Not sure what you mean by "how to address the next @SNH row"?

    Do you really want to select everything where S.cntDTS is between B.cntDTB and B.cntDTB plus one day (that's what adding one to a datetime does)?

    Or do you want everything for a five minute span as the rest of the question seems to indicate? That would be something like:

    S.cntDTS >= B.cntDTB and s.cntDTS < DATEADD(minute,5,B.cntDTB)

    You join syntax is wrong:

    SELECT @DTB.cntDTB, @SNH.cntQ, sum(@SNH.cntQ) 
    FROM @SNH as S, @DTB as B ON S.cntDTS BETWEEN B.cntDTB AND (B.cntDTB+1);

    should be more like this (ignoring that you probably don't really want to add a day to B.cntDTB):

    SELECT @DTB.cntDTB, @SNH.cntQ, sum(@SNH.cntQ) 
    FROM @SNH as S
    -- using ANSI join syntax instead of old comma-delimited list syntax
    INNER JOIN @DTB as B ON S.cntDTS BETWEEN B.cntDTB AND (B.cntDTB+1) ;

     

  • e.g.,

    DECLARE @SNH TABLE
    (
    cntDT DATETIME,
    cntQ varchar(10),
    cntL int
    );

    INSERT INTO @SNH (cntDT, cntQ, cntL)
    VALUES ('2001-04-04 10:00:00', 'Queue01', 3),
    ('2001-04-05 10:03:00', 'Queue01', 1),
    ('2001-04-05 10:06:00', 'Queue01', 1),
    ('2001-04-05 10:07:00', 'Queue01', 1),
    ('2001-04-05 10:11:00', 'Queue01', 1),
    ('2001-04-05 10:12:00', 'Queue01', 10)

    DECLARE @startTime DATETIME = (SELECT MIN(cntDT) FROM @SNH); -- This should be rounded down in real use.
    DECLARE @endTime DATETIME = '2001-04-06'; -- This could be derived from data too.

    WITH tally AS
    (SELECT DATEADD(MINUTE, Number * 5, @startTime) AS cntDT
    FROM util.Numbers
    WHERE DATEADD(MINUTE, Number * 5, @startTime) >= @startTime
    AND DATEADD(MINUTE, Number * 5, @startTime) < @endTime
    )
    SELECT tally.cntDT AS cntDTS, DATEADD(MINUTE,5,tally.cntDT) AS cntDTE,snh.cntQ, sum(snh.cntL)
    FROM @SNH snh INNER JOIN tally ON snh.cntDT >= tally.cntDT AND snh.cntDT < DATEADD(MINUTE,5,tally.cntDT)
    GROUP BY
    tally.cntDT,
    snh.cntQ
    ORDER BY
    tally.cntDT,
    snh.cntQ;

    My numbers table is util.Numbers w/ column name Number -- 0 to some big number. The CTE allows predefining the start & end  datetime columns.

  • I think something like this will do.  And you can change the "INNER JOIN" to "LEFT OUTER JOIN" if you want to show all times ranges for every Q and show cntC as 0 if there were no values for that time.

    /* test data setup *********************************/
    DECLARE @endDate datetime
    DECLARE @interval_minutes smallint
    DECLARE @startDate datetime

    SET @startDate = '2001-04-04 10:00:00'
    SET @endDate = '2001-04-05 10:00:00'
    SET @interval_minutes = 5

    DROP TABLE IF EXISTS #dateRanges;
    CREATE TABLE #dateRanges ( startDate datetime NOT NULL, endDate datetime NOT NULL, PRIMARY KEY ( startDate, endDate ) );

    ;WITH
    cte_tally10 AS (
    SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cte_tally1000 AS (
    SELECT 0 AS number FROM cte_tally10 c1 CROSS JOIN cte_tally10 c2 CROSS JOIN cte_tally10 c3
    ),
    cte_tally1M AS (
    SELECT 0 AS number UNION ALL
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number FROM cte_tally1000 c1 CROSS JOIN cte_tally1000 c2
    )
    INSERT INTO #dateRanges ( startDate, endDate )
    SELECT
    DATEADD(MINUTE, t.number * @interval_minutes, @startDate) AS startDate,
    DATEADD(MINUTE, (t.number + 1) * @interval_minutes, @startDate) AS endDate
    FROM cte_tally1M t
    WHERE t.number BETWEEN 0 AND DATEDIFF(MINUTE, @startDate, @endDate) / @interval_minutes
    ORDER BY startDate, endDate


    /* actual query code ********************************/
    ;WITH cte_distinct_Qs AS (
    SELECT DISTINCT cntQ
    FROM #SNH
    )
    SELECT
    dr.startDate AS cntDTS,
    dr.endDate AS cntDE,
    cdq.cntQ,
    SUM(snh.cntL) AS cntC
    FROM #dateRanges dr
    CROSS JOIN cte_distinct_Qs cdq
    INNER JOIN @SNH snh ON snh.cntDT >= dr.startDate AND snh.cntDT < dr.endDate AND snh.cntQ = cdq.cntQ
    GROUP BY dr.startDate, dr.endDate, cdq.cntQ
    ORDER BY dr.startDate, dr.endDate, cdq.cntQ

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thank you everybody so much and especially @ScottPletcher for the invaluable input. I studied your approach and learned a lot from it!

    Please, consider the following code

    /* input *********************************/DECLARE @SNH TABLE
    (
    cntDT DATETIME,
    cntQ varchar(10),
    cntL int
    );

    INSERT INTO @SNH (cntDT, cntQ, cntL)
    VALUES ('2001-04-04 10:00:00', 'Queue01', 3),
    ('2001-04-05 10:03:00', 'Queue01', 1),
    ('2001-04-05 10:06:00', 'Queue01', 1),
    ('2001-04-05 10:07:00', 'Queue01', 1),
    ('2001-04-05 10:11:00', 'Queue01', 1),
    ('2001-04-05 10:12:00', 'Queue01', 10)
    /* test data setup *********************************/DECLARE @endDate datetime
    DECLARE @interval_minutes smallint
    DECLARE @startDate datetime

    SET @startDate = '2001-04-04 10:00:00'
    SET @endDate = '2001-04-05 10:00:00'
    SET @interval_minutes = 5

    DROP TABLE IF EXISTS #dateRanges;
    CREATE TABLE #dateRanges ( startDate datetime NOT NULL, endDate datetime NOT NULL, PRIMARY KEY ( startDate, endDate ) );

    ;WITH
    cte_tally10 AS (
    SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cte_tally1000 AS (
    SELECT 0 AS number FROM cte_tally10 c1 CROSS JOIN cte_tally10 c2 CROSS JOIN cte_tally10 c3
    ),
    cte_tally1M AS (
    SELECT 0 AS number UNION ALL
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number FROM cte_tally1000 c1 CROSS JOIN cte_tally1000 c2
    )
    INSERT INTO #dateRanges ( startDate, endDate )
    SELECT
    DATEADD(MINUTE, t.number * (@interval_minutes+ABS(CHECKSUM(NewId())) % 60), @startDate) AS startDate,
    DATEADD(MINUTE, (t.number + 1) * @interval_minutes, @startDate) AS endDate
    FROM cte_tally1M t
    WHERE t.number BETWEEN 0 AND DATEDIFF(MINUTE, @startDate, @endDate) / @interval_minutes
    ORDER BY startDate, endDate


    /* actual query code ********************************/;WITH cte_distinct_Qs AS (
    SELECT DISTINCT cntQ
    FROM @SNH
    )

    SELECT TOP 4
    dr.startDate AS cntDTS,
    cdq.cntQ,
    ABS(CHECKSUM(NewId())) % 10 as cntL
    FROM #dateRanges dr
    CROSS JOIN cte_distinct_Qs cdq


    --SELECT
    -- dr.startDate AS cntDTS,
    -- dr.endDate AS cntDE,
    -- cdq.cntQ,
    -- SUM(snh.cntL) AS cntC
    --FROM #dateRanges dr
    --CROSS JOIN cte_distinct_Qs cdq
    --INNER JOIN @SNH snh ON snh.cntDT >= dr.startDate AND snh.cntDT < dr.endDate AND snh.cntQ = cdq.cntQ
    --GROUP BY dr.startDate, dr.endDate, cdq.cntQ
    --ORDER BY dr.startDate, dr.endDate, cdq.cntQ

    Would you please tell how to get from it a resulting table, for the result to have the sum of the two consecutive rows?

    The resulting table, therefore, should have just  3 rows and SUM over neighboring [cntL] values

    • This reply was modified 2 years, 5 months ago by  nkat.
  • just to update and close the problem

    LEAD(return_value ,offset [,default]) 
    OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sort_expression [ASC | DESC], ...
    )

    is the answer I was looking for

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply