April 15, 2012 at 5:59 pm
No mike, it's the same issue, I'm not doing a good job of explaining what I need, I'm going to have another go at it, so drop exapmlelog if you haven’t already.
CREATE TABLE [dbo].[ExampleLog](
[ID] [uniqueidentifier] NOT NULL,
[EVENTNUMBER] [int] NULL,
[DISPLAYSERIAL] [int] NULL,
[DISPLAYID] [varchar](50) NULL,
[OBJECTSERIAL] [int] NULL,
[OBJECTRSSILEVEL] [int] NULL,
[OBJECTID] [varchar](50) NULL,
[TIMESTAMP] [datetime] NULL)
INSERT INTO [dbo].[ExampleLog]
VALUES
(NewID(), 206, 6897913, 'HV 644', 6885819, 66, 'WAP 8', '2012-01-01 12:12:00'),
(NewID(), 206, 6897913, 'HV 644', 6885819, 66, 'WAP 8', '2012-01-01 12:13:00'),
(NewID(), 206, 6897913, 'HV 644', 6877807, 54, 'MV 20', '2012-01-01 12:16:00'),
(NewID(), 103, 6897913, 'HV 644', 6898052, 118, 'HV 62', '2012-01-01 12:20:00'),
(NewID(), 206, 6897913, 'HV 644', 6886130, 122, 'HV 62', '2012-01-01 12:21:00'),
(NewID(), 206, 6897913, 'HV 644', 6886130, 106, 'HV 62', '2012-01-01 12:25:00'),
(NewID(), 204, 6897913, 'HV 644', 6886130, 106, 'HV 62', '2012-01-01 12:31:00'),
(NewID(), 204, 6897913, 'HV 644', 6886130, 106, 'HV 62', '2012-01-01 12:33:00'),
(NewID(), 104, 6897913, 'HV 644', 6886130, 106, 'HV 62', '2012-01-01 12:39:00')
I need to find a way of grouping rows that occur within 5 minutes of the latest timestamp in the group and the displayid and object id are the same, so using a common table expression on our sample data the expected results would look like
RowIDEVENTNUMBERDISPLAYSERIALDISPLAYIDOBJECTSERIALOBJECTRSSILEVELOBJECTIDTIMESTAMPtmp_rank
1A320ECEB-42A0-4A10-B97D-D5751895C0BE2066897913HV 644688581966WAP 82012-01-01 12:12:00.0001
2CD89928B-80C5-4CAA-BEA5-D4135BE68D652066897913HV 644688581966WAP 82012-01-01 12:13:00.0002
348873522-918E-46A8-B804-78CDE6487A582066897913HV 644687780754MV 202012-01-01 12:16:00.0001
433EBDD97-51BB-40A5-9F78-265AF618835A1036897913HV 6446898052118HV 622012-01-01 12:20:00.0001
516AFF3FA-39F7-456B-8A4C-275A08094C2B2066897913HV 6446886130122HV 622012-01-01 12:21:00.0002
6504AF293-DE97-4010-A217-3F4225E1D1362066897913HV 6446886130106HV 622012-01-01 12:25:00.0003
749545FA8-4100-47BF-8437-5D221F9E4A1D2046897913HV 6446886130106HV 622012-01-01 12:31:00.0001
8955F4BEB-ADAD-489F-8913-4CF6824DBE6D2046897913HV 6446886130106HV 622012-01-01 12:33:00.0002
991174B80-08E0-49A5-8B39-635B38859BD91046897913HV 6446886130106HV 622012-01-01 12:39:00.0001
Rows 1 and 2 are grouped because the objectid and displayid are the same and row 2 occurs within 5 minutes of row 1, there are no other rows in the dataset that contain the same displayid and objectid and a timestamp within 5 minutes of the group’s latest timestamp (row 2) so..
Row 3 is grouped by itself because there are no rows in the dataset where the displayid and objectid match within a 5 minute of that groups latest timestamp (row 3).
Row 4, 5 and 6 are grouped because the displayid's match, the objectid's match and the rows timestamps are within 5 minutes of the last row, row 5's timestamp is within 5 minutes of row 4's, and row 6 timestamp is within 5 minutes of row 5's.
Row 7 and 8 are grouped because the objectid's and displayid's are the same and row 8's timestamp is within 5 minutes of that groups latest timestamp (row 7).
Row 9 is grouped by itself because there are no rows in the dataset where the displayid and objectid match within a 5 minute of that groups latest timestamp
April 16, 2012 at 1:05 pm
WITH
cte_remove AS (
SELECT b.*
FROM dbo.ExampleLog a
INNER JOIN dbo.ExampleLog b ON
a.ID != b.ID AND
a.DISPLAYID = b.DISPLAYID AND
a.OBJECTID = b.OBJECTID AND
b.TIMESTAMP BETWEEN a.TIMESTAMP AND DATEADD(SECOND, 300, a.TIMESTAMP)
),
cte_results AS (
SELECT * FROM dbo.ExampleLog EXCEPT
SELECT * FROM cte_remove
)
SELECT * FROM cte_results ORDER BY [timestamp]
GO
??
Substitute BETWEEN for >= AND < depending on your boundary semantics.
And order the WHERE predicates to suit your indexing (I expect the a.ID != b.ID clause might explode the data across a large table like a log, unless you've already limited by DISPLAYID, OBJECTID and TIMESTAMP).
April 16, 2012 at 9:49 pm
Thanks Jimbob, I ran this over a few sample datasets in dev this morning, and this query is returning exactly the results I am after, much appreciated
April 17, 2012 at 8:33 am
You're welcome. Make sure you test this on a sample dataset with a large number of rows, though, just to make sure we haven't introduced a performance time-bomb!
Using a random test set based on the below, I was getting non-trivial query times from as little as 100,000 rows, even with a covering index on DISPLAYID, OBJECTID and TIMESTAMP and a primary key on ID:
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 10^4 or 10,000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 10^8 or 100,000,000 rows
cte_rand AS (
SELECT
n
,a = NEWID()
,b = NEWID()
,c = NEWID()
,d = NEWID()
FROM E8
)
INSERT INTO [dbo].[ExampleLog]
SELECT TOP 100000-- vary this to alter the sample size
id = a
,eventnumber =200 + (CONVERT(INT, CONVERT(VARBINARY(4), a)) % 10)
,displayserial =6897900 + ABS(CONVERT(INT, CONVERT(VARBINARY(4), b)) % 3)
,displayid =CASE ABS(CONVERT(INT, CONVERT(VARBINARY(4), b)) % 3)
WHEN 0 THEN 'AB 123'
WHEN 1 THEN 'CD 456'
WHEN 2 THEN 'EF 789'
END
,objectserial =6886800 + (8 * ABS(CONVERT(INT, CONVERT(VARBINARY(4), c)) % 3))
,objectrssilevel =100 + (8 * ABS(CONVERT(INT, CONVERT(VARBINARY(4), c)) % 3))
,objectid =CASE ABS(CONVERT(INT, CONVERT(VARBINARY(4), c)) % 3)
WHEN 0 THEN 'ZZZ 1'
WHEN 1 THEN 'YY 99'
WHEN 2 THEN 'XX 88'
END
,timestamp =DATEADD(SECOND, ABS(CONVERT(BIGINT, CONVERT(VARBINARY(4), d)) % (60 * 60 * 24 * 7)), CONVERT(DATETIME, '2012-01-01T00:00:00'))
FROM cte_rand
GO
Maybe now, with my query defining the expected results set, one of the more performance-minded contributors can derive a more performant approach...
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply