How to find a set of sequential values within a list

  • Hello all,

    I have a dataset with lat/lon data for vehicles tracking their location every few minutes. It also reflects an estimated speed of travel and the time the record was received/recorded in the database.  A small sample of the data for one vehicle:

    CREATE TABLE #TESTDATA

    (

    VEHICLEID INT

    ,LAT INT

    ,LON INT

    ,MSGDATE INT

    ,MSGTIME INT

    ,SPEED REAL

    ,CRETIME INT

    )

    INSERT INTO #TESTDATA

    VALUES ('1801','41772921','-72613410','20190402','26367','0','26385')

    ,('1801','41772947','-72613442','20190402','26382','0','26402')
    ,('1801','41772963','-72613460','20190402','26419','0','26434')
    ,('1801','41772630','-72613604','20190402','27047','0','27083')
    ,('1801','41772629','-72613594','20190402','27039','0','27044')
    ,('1801','41771403','-72613389','20190402','27144','3.58','27156')
    ,('1801','41772045','-72602795','20190402','27233','7.6','27254')
    ,('1801','41734458','-72602456','20190402','27325','0','27334')
    ,('1801','41734440','-72602450','20190402','27771','0','27785')
    ,('1801','41734763','-72601016','20190402','27821','8.49','27826')

    (NOTE: Time values are in "seconds after midnight" so I do a conversion to see the actual time value:  right('0' + ltrim(str(@time/3600)),2) + ':' + right('0' + ltrim(str((@time%3600)/60)),2)   )

    I'm trying to identify, for each vehicle, each instance where there is 0 speed and the lat/lon has changed only minimally (less than .5 mi).  I need to see the min msgtime for the sequence and the max msgtime for the sequence so I can calculate how many minutes the vehicle has had 0 speed and minimal movement.  All the values are in cretime order (the time each record was created in the database).  For the example above, I'd be looking for the first record and the 5th record to establish the first sequence of 0 speed and minimal change in travel, then I'd grab the msgtime for both records and do the calculation for minutes. Then again for the 8th and 9th record.  Eventually I'll be looking to pull only those cases with 0 (or averaged speed < 5mph) and minimal change in travel for at least 30 minutes. And of course, this would need to be done for multiple vehicles a day for a date range.

    How the heck would I go about doing this?

    I thought I would need to use lead/lag to find next/previous events to do comparisons, but I'm not sure if that's going in the right direction for the final output I'm looking for.

    Can anyone offer any help?  I'm not the strongest in T-SQL so I'm hoping to get some insight on how to do this.

    Thanks!

    • This topic was modified 4 years, 7 months ago by  Dave Convery.
  • Not a full solution, but see if this helps

     

    WITH Src AS (
    SELECT VEHICLEID,LAT,LON,MSGDATE,MSGTIME,SPEED,CRETIME,
    DATEADD(SECOND,MSGTIME,CAST(CAST(MSGDATE AS VARCHAR(8)) AS DATETIME)) AS MSGDATETIME,
    ROW_NUMBER() OVER(PARTITION BY VEHICLEID ORDER BY MSGDATE,MSGTIME) AS rn
    FROM #TESTDATA
    ),
    GrpSrc AS (
    SELECT t1.VEHICLEID,
    t1.rn AS rnStart,
    t1.MSGDATETIME AS MSGDATETIMEStart,
    t2.rn AS rnEnd,
    t2.MSGDATETIME AS MSGDATETIMEEnd,
    t1.rn - ROW_NUMBER() OVER(PARTITION BY t1.VEHICLEID ORDER BY t1.rn) AS rnGrp
    FROM Src t1
    INNER JOIN Src t2 ON t2.rn = t1.rn+1
    AND t2.VEHICLEID = t1.VEHICLEID
    AND t1.SPEED = 0
    AND t2.SPEED = 0
    AND (SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON)) < SQUARE(500)
    )
    SELECT VEHICLEID,
    MIN(rnStart) AS rnStart,
    MAX(rnEnd) AS rnEnd,
    DATEDIFF(minute,MIN(MSGDATETIMEStart),MAX(MSGDATETIMEEnd)) AS MinutesDiff
    FROM GrpSrc
    GROUP BY VEHICLEID, rnGrp
    ORDER BY VEHICLEID, rnGrp;

     

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks Mark!

    I tried to run this but I'm getting an error "an expression of non-Boolean type specified in a context where a condition is expected, near ';' " right at the part with the SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON) ) &lt; SQUARE(500)

    I'm not too familiar or I'd try to fix it myself.  Know what the problem might be?

  • Looks like your browser or something else has changed the less than sign into the HTML code "&lt;".

    John

  • Never mind! I fat fingered it.

    I got it to run, now I'll see what I have...

     

    Thank you so  much! Glad I now have something to start with.  Very much appreciated!

  • Mark, you are the man.

    I found a snippet from another query referencing lat/lon comparisons and made a slight adjustment:

    Instead of AND (SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON)) > SQUARE(500)

    I used AND (ABS(t2.Lon-t1.Lon) > .5 or ABS(t2.Lat-t1.Lat) > .5)

    That seems to find what I'm looking for, for the most part.

    I'm amazed at how little code was necessary to get this data.

    Thank you again for your help with this!

  • Hello again,

    In running through the code for this which Mark provided above (thank you, Mark!), I've discovered that the Min/Max minute calculations between idle periods sometimes excludes records because they're different by seconds. I've been trying to make adjustments to the code on my own to handle these cases, but it's not right.  Thinking perhaps if the times initially are converted to a HH:mm format to begin with, rather than the full date time which includes seconds, it will fix this issue, but I need to hear from some experts out there.

    For example, the final output of the code above (FROM GrpSrc above) looks a little something like this, for one vehicle:

    CREATE TABLE #OUTPUT

    (

    VEHICLEID INT

    ,LDATE INT

    ,RNSTART INT

    ,MSGDATETIMESTART SMALLDATETIME

    ,RNEND INT

    ,MSGDATETIMEEND SMALLDATETIME

    ,MINSDIFF INT

    ,RNGRP INT

    )

    INSERT INTO #OUTPUT

    VALUES ('1719','20190609','199','2019-06-08 18:20:15.000','201','2019-06-08 18:21:43.000','1','155')

    ,('1719','20190609','206','2019-06-08 18:26:42.000','209','2019-06-08 18:34:00.000','8','160')

    ,('1719','20190609','210','2019-06-08 18:34:01.000','225','2019-06-08 18:57:18.000','23','161')

    ,('1719','20190609','239','2019-06-08 19:08:49.000','242','2019-06-08 19:12:48.000','4','175')

    This is almost exactly what I was looking for.  But on the second and third records, the second record ends at 18:34:00 and the next/third record starts at 18:34:01, which separates them out into two separate records, which means they don't get summed together (when they actually should be since the vehicle still hasn't moved).  Because of this, the report is not picking up the full time the vehicle is idle.  I'm seeing some that are up to a minute difference (which is a difference, true, but a margin of error looks like is in order here somehow).

    What could fix this?   Am I delusional thinking it could be as easy as changing the time format to exclude seconds or something? I'm thinking that wouldn't catch those which are truly 1 min different from one record to the next, but I guess it would be something.

    Anyone have any thoughts on this?

     

  • Check out the following articles by Itzik Ben-Gan:

    Packing Intervals which discusses how to ignore gaps up to a certain length and

    New Solutions to the Packing Intervals Problem which updates the previous solution to use windowed functions, but doesn't discuss how to ignore gaps.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thank you, Drew.  That's a very well written article.  I didn't even know the term for this was Packing Intervals, so learn something new every day.

    Of course, I tried using the code, word-for-word, with my table values but it's not working.  The example provided in the second link was so close to my dataset it should have been relatively simple to use for my purposes, but I just can't get it to work.  I'm using SQL v.17 and my code is:

    SELECT *,
    MAX(MSGDATETIMEEnd) OVER(PARTITION BY rnGrp
    ORDER BY MSGDATETIMEStart, MSGDATETIMEEnd, MDTID
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prvend
    FROM #GrpSrc;

    If you look at the example on the second link, you'll see it's exactly the same syntax.  The only difference is I'm pulling the data from a temp table vs a physical table.  I'm beating my head against the wall trying to understand what is wrong with it.  All I'm getting is "Incorrect syntax near 'ROWS' ".

    Can anyone see what I'm not seeing?

  • If you're not using at least SQL Server 2012, the feature/syntax doesn't exist prior to that.  You say you're using v17 but what is that for? SSMS?  That's not what controls which syntax is available.  You have to have at least SQL Serv 2012.

    • This reply was modified 5 years, 6 months ago by  Jeff Moden.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • *Palm to face.

    Sorry I missed that one.

    I'm just having a really difficult time trying to use the code originally provided, and tweaking it with the links also provided (both extremely helpful, thank you!), and getting desired results.

    At this point, I've identified all the periods of idle time for each MDTID, but there are cases where the periods are consecutive, meaning one idle period could be from 730 to 800 and the next one is 801 to 815.  (I don't know why they split to begin with but I'm trying to work with it).  I've created a "flagged" column for those cases where the next record is <= 1 minute later than the previous one.  But I don't know how to sum all of those idle minutes together properly, essentially merging the records with are consecutive and adding up/calculating the total idle time for the whole period.

    Below is sample data with what I have now:

    CREATE TABLE #OUTPUT

    (
    ROW_ID INT
    ,MDTID INT
    ,MSGDATE INT
    ,MSGDATETIMESTART SMALLDATETIME
    ,MSGDATETIMEEND SMALLDATETIME
    ,IDLEMINUTES INT
    ,FLAGGED INT
    )

    INSERT INTO #OUTPUT

    VALUES ('1','1515','20190616','8:35:00','8:37:00','2','0')
    ,('2','1515','20190616','8:37:00','8:42:00','5','1')
    ,('3','1515','20190616','8:47:00','8:49:00','2','0')
    ,('4','1515','20190616','8:53:00','9:03:00','2','0')
    ,('5','1515','20190616','13:43:00','13:47:00','4','0')
    ,('6','1515','20190616','14:10:00','14:13:00','3','0')
    ,('7','1515','20190616','14:24:00','14:27:00','3','0')
    ,('8','1515','20190616','14:28:00','14:28:00','0','1')
    ,('9','1515','20190616','14:29:00','16:23:00','114','1')
    ,('1','1516','20190616','7:10:00','7:13:00','3','0')
    ,('2','1516','20190616','7:25:00','7:29:00','4','0')
    ,('3','1516','20190616','7:46:00','7:52:00','6','0')
    ,('4','1516','20190616','8:08:00','8:30:00','22','0')
    ,('5','1516','20190616','8:31:00','9:14:00','43','1')
    ,('6','1516','20190616','9:18:00','9:24:00','6','0')
    ,('6','1516','20190616','9:28:00','9:31:00','3','0')

    The records with the Flagged = 1 are found to be consecutive and should have their IdleMinutes summed together with the IdleMinutes of the previous record.  But if there are two (or more) consecutive records with Flagged = 1 then IdleMinutes for all 3 should be summed together (record previous to 1 flagged, + both records with flagged = 1).  In the example data, record #2 with 5 minutes of idle has a flag = 1, so it should be combined with the 2 minutes of idle time from record 1.  Essentially combining the idle time on record 1 between 835 to 842 on record 2.  Same with records 8 and 9 which are the next ones with flagged = 0.  The total time from 1424 to 1623 should be the total idle time ...on one record, not 3, so we'll end up with fewer records by the end of this.

    I looked at an example for "packing" here https://technology.amis.nl/2014/06/08/sql-challenge-packing-time-intervals-and-merging-valid-time-periods/ and while I got the code to work, it didn't work right.    Here's what I came up with on that example

    ;with chairpeople as
    (select MDTID
    , MSGDATETIMEStart
    , MSGDATETIMEEnd
    , IdleMinutes
    , Flagged
    , Row_ID rnk
    from #output)

    ,packing (mdtid, starttime, endtime, lvl) as
    ( select MDTID, MSGDATETIMEStart, MSGDATETIMEEnd, 1
    from chairpeople
    where flagged = 0
    union all
    select p.mdtid, p.starttime, c.MSGDATETIMEEnd, lvl+1
    from chairpeople c
    join
    packing p
    on (c.MDTID = p.mdtid and c.rnk = p.lvl+1)
    where c.flagged =1
    )
    , packed as
    ( select MDTID, starttime, max(endtime) date_to
    from packing
    group
    by MDTID, starttime
    )

    select *
    from packed
    order
    by MDTID,starttime

    I was just trying to start off with confirming whether I could even identify the right records and hadn't yet added the actual IdleMinutes into it, but it seems like the flagged records just disappeared entirely so I must have missed something on this one.  I realize the packing will merge/remove records like I want, but I wasn't sure how to proceed, or if this was even the best way to proceed at all.

    Regardless of which way it should be done, I'm just looking for a way to consolidate records in the example output, merging the time values from one record to another only if they're <= 1 min apart.  Otherwise just show the idleminutes for the period as they are.

    Hoping I'm making sense -- I've been at this for several days now, hours and hours of trying different things that haven't worked. Doesn't help that I'm on SQL 2008 so can't use lead/lag or any of those fancy window functions.

    Can anyone please help?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply