January 16, 2012 at 12:52 pm
I am really stuck on this and am probably missing the obvious solution:
I have a table like this:
DROP TABLE TST_Events
CREATE TABLE TST_Events
(
ID INT NULL,
UserID INT NULL,
Location VARCHAR(50) NULL,
LocationDate DATETIME NULL
)
GO
INSERT INTO TST_Events
VALUES(1, 123, 'Columbus', '2011-09-05 13:05:20.883')
INSERT INTO TST_Events
VALUES(2, 123, 'Columbus', '2011-09-05 13:05:04.337')
INSERT INTO TST_Events
VALUES(3, 123, 'Columbus', '2011-09-05 13:05:42.290')
INSERT INTO TST_Events
VALUES(4, 123, 'New York', '2011-10-05 23:40:57.923')
INSERT INTO TST_Events
VALUES(5, 123, 'Atlanta', '2011-08-14 21:36:15.557')
INSERT INTO TST_Events
VALUES(6, 123, 'Atlanta', '2011-08-14 21:36:28.913')
I need to populate a table with a records that exclude any duplicate data which occurred within a 5 minute window. If there are multiple records just grab the top one. So for example my out put should be the following:
1 123 Columbus 2011-09-05 13:05:20.883
4 123 New York 2011-10-05 23:40:57.923
5 123 Atlanta 2011-08-14 21:36:15.557
For the life of me I cannot figure this one out, any help or advice would be greatly appreciated.
January 16, 2012 at 1:01 pm
Please read the post linked in my signature. This will help us help you. 😛
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJanuary 16, 2012 at 1:15 pm
Excellent advice. Thanks for the suggestion I edited my original post to include the table code and data.
Thanks
January 16, 2012 at 1:25 pm
SELECT
Segmented.ID,
Segmented.UserID,
Segmented.Location,
Segmented.LocationDate
FROM
(
SELECT
*,
rn = ROW_NUMBER() OVER (
PARTITION BY TimeSlot.Segment
ORDER BY te.LocationDate)
FROM dbo.TST_Events AS te
CROSS APPLY
(
SELECT
(
(
DATEPART(MINUTE, LocationDate) * 60 +
DATEPART(SECOND, LocationDate)
) / 300
)
) AS TimeSlot (Segment)
) AS Segmented
WHERE
rn = 1
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 16, 2012 at 1:26 pm
So what happens in the case of identical rows that chain longer than 5 minutes?
i.e.
123,‘Columbus’,‘1/5/2012 8:00 PM’
123,‘Columbus’,‘1/5/2012 8:04 PM’
123,‘Columbus’,‘1/5/2012 8:08 PM’
123,‘Columbus’,‘1/5/2012 8:12 PM’
They all occur within 5 minutes of each other, but not within 5 minutes of the first?
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJanuary 16, 2012 at 1:28 pm
Jason Selburg (1/16/2012)
So what happens in the case of identical rows that chain longer than 5 minutes?
I took it to mean that we should divide the day into 5-minute slots and pick the first from each. Might be wrong, but hey.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 16, 2012 at 1:38 pm
SQL Kiwi (1/16/2012)
SELECT
Segmented.ID,
Segmented.UserID,
Segmented.Location,
Segmented.LocationDate
FROM
(
SELECT
*,
rn = ROW_NUMBER() OVER (
PARTITION BY TimeSlot.Segment
ORDER BY te.LocationDate)
FROM dbo.TST_Events AS te
CROSS APPLY
(
SELECT
(
(
DATEPART(MINUTE, LocationDate) * 60 +
DATEPART(SECOND, LocationDate)
) / 300
)
) AS TimeSlot (Segment)
) AS Segmented
WHERE
rn = 1
Nice work Paul. However, you'll need to add in the hour. If the same record comeins during the same minute of a different hour, you're excluding it.
i.e. Add in a record for 123, 'New York', '2011-10-05 20:40:57.923' .. (same miute, but diff. hour)
SELECT
*
FROM
(
SELECT
*,
rn = ROW_NUMBER() OVER (
PARTITION BY TimeSlot.Segment
ORDER BY te.LocationDate)
FROM dbo.TST_Events AS te
CROSS APPLY
(
SELECT
(
(
DATEPART(HOUR, LocationDate) * 360 +
DATEPART(MINUTE, LocationDate) * 60 +
DATEPART(SECOND, LocationDate)
) / 300
)
) AS TimeSlot (Segment)
) AS Segmented
WHERE
rn = 1
But all in all, still much nicer than the road I was going down. 🙂
[EDITED] My sample entry was based off my data. Edited to work with OP's data.
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJanuary 16, 2012 at 1:55 pm
This is awesome! Thank everyone so much for the help. I tried it against my sample data and it worked perfectly. When I ran it against a full data set it only returns about 41 records while there are about 138,000 in the table and not that many 5 minute collisions to account for the numbers. Would I need to include some further segments since my data covers from December 2010 until now? I'm just getting the hang of this coding since we have just upgraded to 2008 a month ago.
Thanks Again
January 16, 2012 at 2:07 pm
good catch, we need to also include the date. One way to do this is to get the number of days since 1/1/1900
SELECT
*
FROM
(
SELECT
*,
rn = ROW_NUMBER() OVER (
PARTITION BY TimeSlot.Segment
ORDER BY te.LocationDate)
FROM dbo.TST_Events AS te
CROSS APPLY
(
SELECT
(
(DATEDIFF(DAY,'1/1/1900', LocationDate) * 8640 +
DATEPART(HOUR, LocationDate) * 360 +
DATEPART(MINUTE, LocationDate) * 60 +
DATEPART(SECOND, LocationDate)
) / 300
)
) AS TimeSlot (Segment)
) AS Segmented
WHERE
rn = 1
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgViewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply