average value before and after a point in time **without** using a union all

  • Hello,

    I seek a way to show daily data with a 30 days average before an event occurred and the 30 day average after the event occurred.  The only way I can think to do this is with a union all statement where I union the before part and the after part.  Is there a better way?  


    SELECT
     facility_id
    ,record_date
    ,soda_gallons_sold
    ,last_facility_repair_date
    ,DATEDIFF(d,last_facility_repair_date, record_date) AS 'day_number'
    ,AVG(soda_gallons) OVER (PARTITION BY facility_id) AS 'average_soda_gallons_sold'
    FROM soda_volume_daily
    WHERE DATEDIFF(d,last_facility_repair_date, record_date) between -30 AND -1

    UNION ALL 

    SELECT
     facility_id
    ,record_date
    ,soda_gallons_sold
    ,last_facility_repair_date
    ,DATEDIFF(d,last_facility_repair_date, record_date) AS 'day_number'
    ,AVG(soda_gallons) OVER (PARTITION BY facility_id) AS 'average_soda_gallons_sold'
    FROM soda_volume_daily
    WHERE DATEDIFF(d,last_facility_repair_date, record_date) between 0 AND 30

  • That doesn't look like T-SQL. There's no GROUP BY and AVERAGE isn't a function in SQL Server, it's AVG. Maybe you've posted on the wrong website? SQL Server Central is, unsurprisingly, a site based on Microsoft's SQL Server. I'd hazard a guess this is MySQL, which some of the users do have experience with, but you might find better traction on a website for the RDBMS you're using.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Unfortunately TSQL doesn't support intervals for the RANGE clause of Window functions, so you pretty much have to resort to correlated subqueries. Something a bit like:

    Drop Table If Exists #Test
    Create Table #Test
    (
     EventID Int Identity (1,1),
     EventDate Date Not Null,
     Sales Int Not Null
    )
    Insert Into #Test Values
    ('2018-01-01', 23),('2018-01-05', 9),('2018-01-06', 23),('2018-01-10', 43),('2018-02-01', 32),
    ('2018-02-05', 64),('2018-02-06', 21),('2018-02-10', 31),('2018-02-20', 11),('2018-02-24', 19),
    ('2018-02-25', 41),('2018-02-28', 91),('2018-03-01', 11),('2018-03-07', 1),('2018-03-24', 29)
    Select * 
    -- Averages of values
    ,(Select Avg(Sales) From #Test T2 Where DateDiff(Day, T2.EventDate, T.EventDate) Between -31 And -1) As Prev30
    ,(Select Avg(Sales) From #Test T2 Where DateDiff(Day, T2.EventDate, T.EventDate) Between 1 And 31) As Next30
    -- Or daily averages
    ,(Select Sum(Sales)/30 From #Test T2 Where DateDiff(Day, T2.EventDate, T.EventDate) Between -31 And -1) As Prev30Daily
    ,(Select Sum(Sales)/30 From #Test T2 Where DateDiff(Day, T2.EventDate, T.EventDate) Between 1 And 31) As Next30Daily
    From #Test T

  • I'm not sure if you need a CASE expression in the AVG function or in the PARTITION BY clause. It depends of what you actually need. Without sample data and expected results, it's hard to tell.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • "AVERAGE" isn't a SQL Server function, afaik.  It seems easy enough to get the result in SQL Server, but I'm not sure that would help you.

    Do you want a solution for SQL Server / T-SQL or for some other rdbms?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Assuming that you really are looking for a T-SQL version, I believe that this gives you what you are looking for.

    SELECT
        facility_id,
        record_date,
        soda_gallons_sold,
        last_facility_repair_date,
        d.day_number
        AVG(soda_gallons) OVER(PARTITION BY facility_id, g.grp)
    FROM soda_volume_daily
    CROSS APPLY (VALUES(DATEDIFF(DAY, last_facility_repair_date, record_date))) d(day_number)
    CROSS APPLY (VALUES(CASE WHEN d.day_number < 0 THEN -1 ELSE 1 END)) g(grp)  -- define groups of days
    WHERE d.day_number BETWEEN -30 AND 30

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • magical dev - Wednesday, August 15, 2018 2:36 PM

    Hello,

    I seek a way to show daily data with a 30 days average before an event occurred and the 30 day average after the event occurred.  The only way I can think to do this is with a union all statement where I union the before part and the after part.  Is there a better way?  


    SELECT
     facility_id
    ,record_date
    ,soda_gallons_sold
    ,last_facility_repair_date
    ,DATEDIFF(d,last_facility_repair_date, record_date) AS 'day_number'
    ,AVERAGE(soda_gallons) OVER (PARTITION BY facility_id) AS 'average_soda_gallons_sold'
    FROM soda_volume_daily
    WHERE DATEDIFF(d,last_facility_repair_date, record_date) between -30 AND -1

    UNION ALL 

    SELECT
     facility_id
    ,record_date
    ,soda_gallons_sold
    ,last_facility_repair_date
    ,DATEDIFF(d,last_facility_repair_date, record_date) AS 'day_number'
    ,AVERAGE(soda_gallons) OVER (PARTITION BY facility_id) AS 'average_soda_gallons_sold'
    FROM soda_volume_daily
    WHERE DATEDIFF(d,last_facility_repair_date, record_date) between 0 AND 30

    Without sample data and the expected result, I cannot fully test the code.
    That said, I believe that this will give you the average sales 30 days either side of any record.
    SELECT
      facility_id
    , record_date
    , soda_gallons_sold
    , last_facility_repair_date
    , day_number        = DATEDIFF(d, last_facility_repair_date, record_date)
    , average_soda_gallons_sold_before = AVG(soda_gallons_sold) OVER (PARTITION BY facility_id 
                           ORDER BY record_date
                           ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING )
    , average_soda_gallons_sold_after = AVG(soda_gallons_sold) OVER (PARTITION BY facility_id 
                           ORDER BY record_date
                           ROWS BETWEEN 1 FOLLOWING AND 30 FOLLOWING )
    FROM soda_volume_daily

  • It's worth noting that the "best" solution here depends greatly on if you are guaranteed to have exactly 1 row for every single day with no "gaps" or whether you may have multiple rows per day and/or gaps in the date ranges.

  • DesNorton - Thursday, August 16, 2018 11:56 AM

    Without sample data and the expected result, I cannot fully test the code.
    That said, I believe that this will give you the average sales 30 days either side of any record.
    SELECT
      facility_id
    , record_date
    , soda_gallons_sold
    , last_facility_repair_date
    , day_number        = DATEDIFF(d, last_facility_repair_date, record_date)
    , average_soda_gallons_sold_before = AVG(soda_gallons_sold) OVER (PARTITION BY facility_id 
                           ORDER BY record_date
                           ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING )
    , average_soda_gallons_sold_after = AVG(soda_gallons_sold) OVER (PARTITION BY facility_id 
                           ORDER BY record_date
                           ROWS BETWEEN 1 FOLLOWING AND 30 FOLLOWING )
    FROM soda_volume_daily

    I would say that a solution like that would require a different approach because this one isn't guaranteed as Andy mentioned.

    SELECT
       facility_id
    , record_date
    , soda_gallons_sold
    , last_facility_repair_date
    , day_number   = DATEDIFF(d, last_facility_repair_date, record_date)
    , average_soda_gallons_sold_before = AVG(CASE WHEN last_facility_repair_date > record_date THEN soda_gallons_sold END)
                                           OVER (PARTITION BY facility_id ORDER BY record_date )
    , average_soda_gallons_sold_after = AVG(CASE WHEN last_facility_repair_date <= record_date THEN soda_gallons_sold END)
                                           OVER (PARTITION BY facility_id ORDER BY record_date)
    FROM soda_volume_daily

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • [Of course the wrong post got quoted when I got around to actually posting this.  I've removed the quote.]

    I think that you might be making an unwarranted assumption that it has to be relative to the current row.  The OP made it sound like the original query did what he wanted, but he was looking for a way to avoid the UNION.  The original query does not reference records with respect to the current row.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • andycadley - Thursday, August 16, 2018 12:02 PM

    It's worth noting that the "best" solution here depends greatly on if you are guaranteed to have exactly 1 row for every single day with no "gaps" or whether you may have multiple rows per day and/or gaps in the date ranges.

    Given the name of the table, it sounds like this is a completely reasonable assumption.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, August 16, 2018 12:40 PM

    andycadley - Thursday, August 16, 2018 12:02 PM

    It's worth noting that the "best" solution here depends greatly on if you are guaranteed to have exactly 1 row for every single day with no "gaps" or whether you may have multiple rows per day and/or gaps in the date ranges.

    Given the name of the table, it sounds like this is a completely reasonable assumption.

    Drew

    Normally I wouldn't disagree but I've seen plenty of "daily" tables where the "oh, well we don't sell on Sundays/Bank Holidays/Xmas so they're blank" rules apply. It's always worth being clear on the assumptions being made.

  • Thom A - Wednesday, August 15, 2018 3:05 PM

    That doesn't look like T-SQL. There's no GROUP BY and AVERAGE isn't a function in SQL Server, it's AVG. Maybe you've posted on the wrong website? SQL Server Central is, unsurprisingly, a site based on Microsoft's SQL Server. I'd hazard a guess this is MySQL, which some of the users do have experience with, but you might find better traction on a website for the RDBMS you're using.

    I updated AVERAGE to AVG.  You will need to excuse my typo,  now it is TSQL.

    fyi

    you don't use a group by clause with windows functions.

  • ScottPletcher - Thursday, August 16, 2018 10:08 AM

    "AVERAGE" isn't a SQL Server function, afaik.  It seems easy enough to get the result in SQL Server, but I'm not sure that would help you.

    Do you want a solution for SQL Server / T-SQL or for some other rdbms?

    sorry, i goofed and used average instead of avg.  I am working in TSQL.

  • drew.allen - Thursday, August 16, 2018 11:50 AM

    Assuming that you really are looking for a T-SQL version, I believe that this gives you what you are looking for.

    SELECT
        facility_id,
        record_date,
        soda_gallons_sold,
        last_facility_repair_date,
        d.day_number
        AVG(soda_gallons) OVER(PARTITION BY facility_id, g.grp)
    FROM soda_volume_daily
    CROSS APPLY (VALUES(DATEDIFF(DAY, last_facility_repair_date, record_date))) d(day_number)
    CROSS APPLY (VALUES(CASE WHEN d.day_number < 0 THEN -1 ELSE 1 END)) g(grp)  -- define groups of days
    WHERE d.day_number BETWEEN -30 AND 30

    Drew

    Yes, this is the one.  beautiful.  now the window can be parameterized.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply