Rowset Sum Collection

  • Hello, I'm trying to figure out how to run a rowset operation to do the following:

    I have the following table:

    create table #temp

    (

    user_id_num int,

    insert_date datetime,

    )

    I want to create a query to see if there is a 24 hour period in this table where any user has more than 50 records inserted. Note: 24 hours doesn't necessarily mean an individual day - it could be 11/3/2008 12:37 - 11/4/2008 12:37.

    I want to avoid creating a subquery for every row in the table, and with all the new additions in T-SQL for 2005, I have to think there's a better way - I just can't figure it out. And of course, no cursors...

  • Interesting question. I am not sure of the best way to do this either but I will give it some more thought. But my initial thought is just to do a self join on the table with a dateadd function and group the results. But the fact that the 24hr period is always moving means you would get results twice for example if someone logged in every minute for 51 minutes. You would get that person for the 24 hour period, and then the 24 hour period that started on minute after the first.

    Sample code/illustration.

    SET NOCOUNT ON

    DECLARE @T AS TABLE (

    User_Id_Num INT,

    Insert_Date DATETIME

    )

    DECLARE @ctr AS INT

    -- 50 Logins every minute so should show

    SET @Ctr = 0

    WHILE @Ctr < 50

    BEGIN

    INSERT INTO @T

    VALUES (1, DATEADD(minute, @ctr, GETDATE()))

    SET @Ctr = @Ctr + 1

    END

    -- 50 Logins in hours, so should not be there

    SET @Ctr = 0

    WHILE @Ctr < 50

    BEGIN

    INSERT INTO @T

    VALUES (2, DATEADD(hour, @ctr, GETDATE()))

    SET @Ctr = @Ctr + 1

    END

    -- 40 in 40 minutes so should not show up

    SET @Ctr = 0

    WHILE @Ctr < 40

    BEGIN

    INSERT INTO @T

    VALUES (3, DATEADD(minute, @ctr, GETDATE()))

    SET @Ctr = @Ctr + 1

    END

    -- If 51 logins, then three lists of 50 logins in 24 hrs

    -- because the period is always moving.

    SET @Ctr = 0

    WHILE @Ctr < 51

    BEGIN

    INSERT INTO @T

    VALUES (4, DATEADD(minute, @ctr, GETDATE()))

    SET @Ctr = @Ctr + 1

    END

    SELECT T1.User_Id_Num,

    T1.Insert_Date

    FROM @T T1 INNER JOIN @T T2 ON

    (T1.User_Id_Num = T2.User_Id_Num)

    WHERE T2.Insert_Date BETWEEN

    T1.Insert_Date AND DATEADD(HOUR, 24, T1.Insert_Date)

    GROUP BY T1.User_Id_Num, T1.Insert_Date

    HAVING COUNT(*) >= 50

    So as I said, this works but probably a better way. If I think of something else I'll let you know.

  • I thought of a different way to do this, but I'd like to test it first.

    Can you please post some sample data?

  • I wonder, would a SELF JOIN and use DATEDIFF or <= operator work?

    it would be more SET BASED

    SELECT T1.user_id_num

    FROM #temp T1

    INNER JOIN #temp T2 ON DATEDIFF(hh, T1.insert_date DATETIME, T2.insert_date DATETIME) <= 24

    --DATEADD(hh,24,T1.insert_date datetime) <= T2.insert_date datetime

    GROUP BY user_id_num

    HAVING COUNT(1) > 50

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Well I am allowed some margin of error with the time range, so this turns out to be the best solution I came up with:

    First aggregate the data:

    declare @time_divisor float

    select @time_divisor = 96 --Set to group by every 15 minutes (1 day/96)

    select @time_divisor = 1/@time_divisor

    select user_id_num,

    Cast(Floor(Cast(insert_date as float) / @time_divisor) * @time_divisor + .00000001 as smalldatetime) time_interval,

    Count(*) total_requests

    into #agg_data

    from #temp

    group by user_id_num,

    Cast(Floor(Cast(insert_date as float) / @time_divisor) * @time_divisor + .00000001 as smalldatetime)

    Then remove data that I know isn't eligible (which is half of the records)

    create clustered index ix_tmp on #agg_data (user_id_num, time_interval)

    delete from #agg_data

    where user_id_num IN

    (select user_id_num

    from #temp

    group by user_id_num having Sum(total_requests) < 50)

    Finally get the results

    select distinct t1.*,

    Sum(t2.total_requests)

    OVER (PARTITION BY t1.user_id_num, t1.time_interval) requests_in_last_24_hours

    into #final_table

    from #agg_data t1, #agg_data t2

    where t1.user_id_num = t2.user_id_num

    and t2.time_interval > t1.time_interval - 1

    and t2.time_interval <= t1.time_interval

    OPTION (MAXDOP 1)

    select * from #final_table

    where requests_in_last_24_hours >= 50

    order by user_id_num, time_interval desc

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply