Find rows with very close time stamps

  • Given the following table and query, this will return any records(based on message_id) that have timestamps within 2 seconds.  This works and gives me the correct output.  So 2 questions.

    1. how does this solution stack up compared to other solutions,
    2. if the total possible clicks for a particular messageid is very large, is this still a good solution.

    I wish I didn't need to do this, but the email provider we are using doesn't have good bot filtering.  So the assumption is if there are 2 or more clickable links and both are clicked at the same time or very close, it's a bot and we need to filter it out.

    CREATE TABLE #clicks ( [message_id] bigint, [clicked_url] varchar(1000), [clicked_at] datetime2(0) )

    INSERT INTO #clicks ([message_id], [clicked_url], [clicked_at])
    VALUES
    ( 55732046797489704, 'clicked url6EF62048-318A-4C79-A8FC-E65622D2A8F4', N'2024-10-23T14:40:16' ),
    ( 55732046797489704, 'clicked urlD60CBDCA-9EAF-499B-80EE-AD3233F34EE0', N'2024-10-23T14:40:16' ),
    ( 55732046797489801, 'clicked urlB89C329F-4316-43CF-809A-9B11E88DB9FF', N'2024-10-23T14:40:22' ),
    ( 55732046797489801, 'clicked url5C6B6CE6-AEFE-4DA3-BE90-AB99A505D70A', N'2024-10-23T14:40:22' ),
    ( 55732046797490060, 'clicked url1BDC5F9C-F7E2-47CD-8D4E-11929C3FA7E9', N'2024-10-23T14:41:29' ),
    ( 55732046797490060, 'clicked url97C0C9EA-70F1-4CFD-887A-5B65872ABA62', N'2024-10-23T14:41:29' ),
    ( 55732046797490089, 'clicked url056673A6-CD95-4DAB-9BD9-3BB49DF2D2AC', N'2024-10-23T14:40:46' ),
    ( 55732046797490089, 'clicked url23E8D35E-B0B9-4DC4-9B69-7838A7202DD7', N'2024-10-23T14:40:46' ),
    ( 55732046797490200, 'clicked url4CE9BBFA-12C7-4D90-89C4-F5203A069967', N'2024-10-23T14:40:22' ),
    ( 55732046797490200, 'clicked url2D26431E-982F-4E09-ADA4-18854A21ECC0', N'2024-10-23T14:40:22' ),
    ( 55732046797490480, 'clicked urlE1F6B24F-D360-44C9-B650-125E257952E4', N'2024-10-23T14:41:47' ),
    ( 55732046797490480, 'clicked urlB87BA94E-1F03-43AB-AC3C-9C126AB032C1', N'2024-10-23T14:41:47' ),
    ( 55732046797490518, 'clicked url3A8157CD-85AD-4A7C-8A61-27DFEB92FF29', N'2024-10-23T14:41:24' ),
    ( 55732046797490518, 'clicked urlE5690795-1805-4F91-A808-F0C51A20CC19', N'2024-10-23T14:41:24' ),
    ( 55732046797490812, 'clicked url51F92B4A-004D-4F1B-BF38-BA89D5EC065E', N'2024-10-23T14:40:42' ),
    ( 55732046797491095, 'clicked urlBF74CE08-1AE3-45ED-9E79-B3F6AC2F26E8', N'2024-10-23T14:41:46' ),
    ( 55732046797491095, 'clicked url6AF11300-38B3-4B12-A002-E3806A5315EB', N'2024-10-23T14:41:46' )


    WITH cte
    AS (SELECT mc.message_id
    ,DATEDIFF(SECOND, mc.clicked_at, LAG(mc.clicked_at, 1, NULL) OVER (PARTITION BY mc.message_id ORDER BY mc.clicked_at)) AS time_diff
    FROM #clicks AS mc)
    SELECT DISTINCT
    cte.message_id
    FROM cte
    WHERE cte.time_diff < 2;
  • daytonbrown2 wrote:

    Given the following table and query, this will return any records(based on message_id) that have timestamps within 2 seconds.  This works and gives me the correct output.

    The query compares each row to its lag and not "any record within 2 seconds".  Afaik to accomplish what you're suggesting you'd need to partial join the table to itself

    select distinct c1.message_id
    from #clicks c1
    join #clicks c2 on c1.message_id = c2.message_id
    and abs(datediff(second, c1.clicked_at, c2.clicked_at))<=2;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • If any consecutive click within 2 seconds is sufficient then the query posted looks ok 🙂

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply