Need help with sql query

  • Hi team,

    I am currently working on weather data to see if there is rain in past 3 hours. I have data for 10 years.

    The sample data is presented below

    ID datetime rain

    1 2016/01/01 01:00:00 0

    1 2016/01/01 03:00:00 8

    1 2016/01/01 05:00:00 6

    1 2016/01/01 06:00:00 0

    2 2016/01/01 08:00:00 6

    2 2016/01/01 10:00:00 0

    2 2016/01/01 15:00:00 0

    3 2016/01/01 16:00:00 0

    3 2016/01/01 20:00:00 10

    3 2016/01/01 21:00:00 3

    4 2016/01/01 23:00:00 4

    4 2016/01/02 01:00:00 5

    4 2016/01/02 05:00:00 0

    I need to have a column to see if there is any rain in past 3 hours. if there is rain it should show '1' else '0'.

    the output should look this

    0

    1

    1

    1

    1

    1

    0

    0

    1

    1

    1

    1

    0

    I have hourly data for 10 years. Any help is appreciated.

    Thanks in Advance.

  • Quick suggestion, look up the SIGN function

    😎

    ;WITH SAMPLE_DATA(ID,datetime,rain) AS

    (SELECT ID,CONVERT(DATETIME,datetime,112),rain FROM (VALUES

    (1,'20160101 01:00:00', 0)

    ,(1,'20160101 03:00:00', 8)

    ,(1,'20160101 05:00:00', 6)

    ,(1,'20160101 06:00:00', 0)

    ,(2,'20160101 08:00:00', 6)

    ,(2,'20160101 10:00:00', 0)

    ,(2,'20160101 15:00:00', 0)

    ,(3,'20160101 16:00:00', 0)

    ,(3,'20160101 20:00:00',10)

    ,(3,'20160101 21:00:00', 3)

    ,(4,'20160101 23:00:00', 4)

    ,(4,'20160102 01:00:00', 5)

    ,(4,'20160102 05:00:00', 0)

    ) AS X(ID,datetime,rain))

    SELECT

    SD.ID

    ,SD.datetime

    ,SD.rain

    ,SIGN(SD.rain) AS RAIN_FLAG

    FROM SAMPLE_DATA SD;

    Output

    ID datetime rain RAIN_FLAG

    ----------- ----------------------- ----------- -----------

    1 2016-01-01 01:00:00.000 0 0

    1 2016-01-01 03:00:00.000 8 1

    1 2016-01-01 05:00:00.000 6 1

    1 2016-01-01 06:00:00.000 0 0

    2 2016-01-01 08:00:00.000 6 1

    2 2016-01-01 10:00:00.000 0 0

    2 2016-01-01 15:00:00.000 0 0

    3 2016-01-01 16:00:00.000 0 0

    3 2016-01-01 20:00:00.000 10 1

    3 2016-01-01 21:00:00.000 3 1

    4 2016-01-01 23:00:00.000 4 1

    4 2016-01-02 01:00:00.000 5 1

    4 2016-01-02 05:00:00.000 0 0

  • Thank you very much for the reply. It works to check if there is rain in that hour. However, i also need to check previous 2 hours for each ID.

  • Missed the three hours earlier, here is an improved query

    😎

    ;WITH SAMPLE_DATA(ID,datetime,rain) AS

    (SELECT ID,CONVERT(DATETIME,datetime,112),rain FROM (VALUES

    (1,'20160101 01:00:00', 0)

    ,(1,'20160101 03:00:00', 8)

    ,(1,'20160101 05:00:00', 6)

    ,(1,'20160101 06:00:00', 0)

    ,(2,'20160101 08:00:00', 6)

    ,(2,'20160101 10:00:00', 0)

    ,(2,'20160101 15:00:00', 0)

    ,(3,'20160101 16:00:00', 0)

    ,(3,'20160101 20:00:00',10)

    ,(3,'20160101 21:00:00', 3)

    ,(4,'20160101 23:00:00', 4)

    ,(4,'20160102 01:00:00', 5)

    ,(4,'20160102 05:00:00', 0)

    ) AS X(ID,datetime,rain))

    SELECT

    SD.ID

    ,SD.datetime

    -- Check if the earlier row was within the three hours

    ,SIGN(SIGN(CASE WHEN DATEDIFF(HOUR,LAG(SD.datetime,1) OVER

    (

    ORDER BY SD.datetime ASC

    ),SD.datetime) <= 3 THEN LAG(SD.rain,1)OVER

    (

    ORDER BY SD.datetime ASC

    )

    ELSE 0 END)

    -- add the earlier value to the current flag

    + SIGN(SD.rain)) AS RAIN_FLAG

    ,SD.rain

    FROM SAMPLE_DATA SD;

    Output

    ID datetime RAIN_FLAG rain

    ----------- ----------------------- ----------- -----------

    1 2016-01-01 01:00:00.000 0 0

    1 2016-01-01 03:00:00.000 1 8

    1 2016-01-01 05:00:00.000 1 6

    1 2016-01-01 06:00:00.000 1 0

    2 2016-01-01 08:00:00.000 1 6

    2 2016-01-01 10:00:00.000 1 0

    2 2016-01-01 15:00:00.000 0 0

    3 2016-01-01 16:00:00.000 0 0

    3 2016-01-01 20:00:00.000 1 10

    3 2016-01-01 21:00:00.000 1 3

    4 2016-01-01 23:00:00.000 1 4

    4 2016-01-02 01:00:00.000 1 5

    4 2016-01-02 05:00:00.000 0 0

  • Dear Eirikur Eiriksson,

    Thank you very much. The query example you have given worked.

  • Careful with LAG its only looking at the previous record for the 3 hr window. Your data could be confusing and given your dataset Eirikur solution does work, but you say you are taking readings every hour, it wont work under that paradigm.

    The solution I posted for you on the 2008 section will work either way.

    declare @tmpWeatherData table (ID int, RecDate Datetime, RainAMT int)

    INSERT INTO @tmpWeatherData (ID,RecDate, RainAMT)

    VALUES

    (1,'2016/01/01 01:00:00',0),(1,'2016/01/01 03:00:00',8),(1,'2016/01/01 05:00:00',6),(1,'2016/01/01 06:00:00',0),(2,'2016/01/01 08:00:00',6)

    ,(2,'2016/01/01 10:00:00',0),(2,'2016/01/01 15:00:00',0),(3,'2016/01/01 16:00:00',0),(3,'2016/01/01 20:00:00',10),(3,'2016/01/01 21:00:00',3)

    ,(4,'2016/01/01 23:00:00',4),(4,'2016/01/02 01:00:00',5),(4,'2016/01/02 05:00:00',0)

    SELECT

    t1.ID,t1.RecDate,t1.RainAMT,

    CASE

    WHEN t2.RainAMT > 0 THEN 1

    ELSE 0

    END as RainPrev3Hrs

    FROM @tmpWeatherData t1

    OUTER APPLY (SELECT SUM(RainAMT) as RainAMT FROM @tmpWeatherData WHERE RecDate BETWEEN DateAdd(hh,-3,t1.RecDate) and t1.RecDate ) t2

    ORDER BY t1.ID,t1.RecDate

  • Here is the final query I have used with a small modification with yours since I have hourly data. Once again, Thank You very much.

    SELECT

    SD.ID

    ,SD.datetime

    -- Check if the earlier two rows were within the three hours

    ,SIGN(SIGN(CASE WHEN DATEDIFF(HOUR,LAG(SD.datetime,1) OVER

    (

    Partition by SD.ID ORDER BY SD.datetime ASC

    ),SD.datetime) <= 3 THEN LAG(SD.rain,1)OVER

    (

    ORDER BY SD.datetime ASC

    )

    ELSE 0 END)

    + SIGN(CASE WHEN DATEDIFF(HOUR,LAG(SD.datetime,2) OVER

    (

    Partition by SD.ID ORDER BY SD.datetime ASC

    ),SD.datetime) <= 3 THEN LAG(SD.rain,2)OVER

    (

    ORDER BY SD.datetime ASC

    )

    ELSE 0 END)

    -- add the earlier value to the current flag

    + SIGN(SD.rain)) AS RAIN_FLAG

    ,SD.rain

    FROM SAMPLE_DATA SD;

  • I'm just going to make my final argument against using the LAG function here, in this case.

    What if you start getting data at 30 min intervals? The LAG (and LEAD for that matter) function is/are great for not having to join to the same table to get differing data for comparisons etc... but it gets very messy (which is where your SQL is headed) when you have to look at RANGE of values.

    Just my opinion, best of luck Venkat, and again welcome to the forums!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply