Excluding records that took place within 5 minutes

  • I am really stuck on this and am probably missing the obvious solution:

    I have a table like this:

    DROP TABLE TST_Events

    CREATE TABLE TST_Events

    (

    ID INT NULL,

    UserID INT NULL,

    Location VARCHAR(50) NULL,

    LocationDate DATETIME NULL

    )

    GO

    INSERT INTO TST_Events

    VALUES(1, 123, 'Columbus', '2011-09-05 13:05:20.883')

    INSERT INTO TST_Events

    VALUES(2, 123, 'Columbus', '2011-09-05 13:05:04.337')

    INSERT INTO TST_Events

    VALUES(3, 123, 'Columbus', '2011-09-05 13:05:42.290')

    INSERT INTO TST_Events

    VALUES(4, 123, 'New York', '2011-10-05 23:40:57.923')

    INSERT INTO TST_Events

    VALUES(5, 123, 'Atlanta', '2011-08-14 21:36:15.557')

    INSERT INTO TST_Events

    VALUES(6, 123, 'Atlanta', '2011-08-14 21:36:28.913')

    I need to populate a table with a records that exclude any duplicate data which occurred within a 5 minute window. If there are multiple records just grab the top one. So for example my out put should be the following:

    1 123 Columbus 2011-09-05 13:05:20.883

    4 123 New York 2011-10-05 23:40:57.923

    5 123 Atlanta 2011-08-14 21:36:15.557

    For the life of me I cannot figure this one out, any help or advice would be greatly appreciated.

  • Please read the post linked in my signature. This will help us help you. 😛

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Excellent advice. Thanks for the suggestion I edited my original post to include the table code and data.

    Thanks

  • SELECT

    Segmented.ID,

    Segmented.UserID,

    Segmented.Location,

    Segmented.LocationDate

    FROM

    (

    SELECT

    *,

    rn = ROW_NUMBER() OVER (

    PARTITION BY TimeSlot.Segment

    ORDER BY te.LocationDate)

    FROM dbo.TST_Events AS te

    CROSS APPLY

    (

    SELECT

    (

    (

    DATEPART(MINUTE, LocationDate) * 60 +

    DATEPART(SECOND, LocationDate)

    ) / 300

    )

    ) AS TimeSlot (Segment)

    ) AS Segmented

    WHERE

    rn = 1

  • So what happens in the case of identical rows that chain longer than 5 minutes?

    i.e.

    123,‘Columbus’,‘1/5/2012 8:00 PM’

    123,‘Columbus’,‘1/5/2012 8:04 PM’

    123,‘Columbus’,‘1/5/2012 8:08 PM’

    123,‘Columbus’,‘1/5/2012 8:12 PM’

    They all occur within 5 minutes of each other, but not within 5 minutes of the first?

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Jason Selburg (1/16/2012)


    So what happens in the case of identical rows that chain longer than 5 minutes?

    I took it to mean that we should divide the day into 5-minute slots and pick the first from each. Might be wrong, but hey.

  • SQL Kiwi (1/16/2012)


    SELECT

    Segmented.ID,

    Segmented.UserID,

    Segmented.Location,

    Segmented.LocationDate

    FROM

    (

    SELECT

    *,

    rn = ROW_NUMBER() OVER (

    PARTITION BY TimeSlot.Segment

    ORDER BY te.LocationDate)

    FROM dbo.TST_Events AS te

    CROSS APPLY

    (

    SELECT

    (

    (

    DATEPART(MINUTE, LocationDate) * 60 +

    DATEPART(SECOND, LocationDate)

    ) / 300

    )

    ) AS TimeSlot (Segment)

    ) AS Segmented

    WHERE

    rn = 1

    Nice work Paul. However, you'll need to add in the hour. If the same record comeins during the same minute of a different hour, you're excluding it.

    i.e. Add in a record for 123, 'New York', '2011-10-05 20:40:57.923' .. (same miute, but diff. hour)

    SELECT

    *

    FROM

    (

    SELECT

    *,

    rn = ROW_NUMBER() OVER (

    PARTITION BY TimeSlot.Segment

    ORDER BY te.LocationDate)

    FROM dbo.TST_Events AS te

    CROSS APPLY

    (

    SELECT

    (

    (

    DATEPART(HOUR, LocationDate) * 360 +

    DATEPART(MINUTE, LocationDate) * 60 +

    DATEPART(SECOND, LocationDate)

    ) / 300

    )

    ) AS TimeSlot (Segment)

    ) AS Segmented

    WHERE

    rn = 1

    But all in all, still much nicer than the road I was going down. 🙂

    [EDITED] My sample entry was based off my data. Edited to work with OP's data.

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • This is awesome! Thank everyone so much for the help. I tried it against my sample data and it worked perfectly. When I ran it against a full data set it only returns about 41 records while there are about 138,000 in the table and not that many 5 minute collisions to account for the numbers. Would I need to include some further segments since my data covers from December 2010 until now? I'm just getting the hang of this coding since we have just upgraded to 2008 a month ago.

    Thanks Again

  • good catch, we need to also include the date. One way to do this is to get the number of days since 1/1/1900

    SELECT

    *

    FROM

    (

    SELECT

    *,

    rn = ROW_NUMBER() OVER (

    PARTITION BY TimeSlot.Segment

    ORDER BY te.LocationDate)

    FROM dbo.TST_Events AS te

    CROSS APPLY

    (

    SELECT

    (

    (DATEDIFF(DAY,'1/1/1900', LocationDate) * 8640 +

    DATEPART(HOUR, LocationDate) * 360 +

    DATEPART(MINUTE, LocationDate) * 60 +

    DATEPART(SECOND, LocationDate)

    ) / 300

    )

    ) AS TimeSlot (Segment)

    ) AS Segmented

    WHERE

    rn = 1

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply