October 28, 2016 at 7:57 pm
Hi all,
I require some assistance and i'm not sure on the best way to proceed.
I have a table which records customers and their events (how its missing their "sesson"). Example below, customer 000001 had an event named ER1/16 and a start/end datetime and another event AC1/16 (start/end datetime). These events need to be placed in the same session within +/- 2 hours (there is no identifier in the table for session). However their next event ER2/16 is a few days after and should be excluded from the session.
How do i go about creating this session key and relating different events within +/- 2 hours for the same customer?
Sample data below
CREATE TABLE HELP(
[rownumber] [INT],
[customer] [INT],
[session] [INT],
[event] [VARCHAR](10),
[startdatetime] [datetime],
[enddatetime] [datetime]
)
INSERT INTO HELP
(rownumber,customer,session,event,startdatetime,enddatetime)
VALUES
(1,000001,NULL,'E1/16','2016-09-21 11:59:00.000','2016-09-21 18:00:00.000'),
(2,000001,NULL,'A1/16','2016-09-21 17:22:00.000','2016-09-26 16:40:00.000'),
(3,000001,NULL,'E2/16','2016-09-27 17:40:00.000','2016-09-27 20:58:00.000')
select * from help
Thank you for any and all help in advance!
October 29, 2016 at 12:20 am
Take a look at New Solution to the Packing Intervals Problem.
Hint: You may want to add an index on (customer, startdatetime, enddatetime, rownumber) (prefereably clustered)
October 29, 2016 at 7:23 am
Hello!
Thank you for the reply I will certainly take a look.
My dataset is not as clean but I think I can make it work. Example: a user can have accounts in different regions. So aside from me just knowing the user account there is no other method for me to tell if it's the same user. All the other values are unique.
Current I do something similar to what is in the example but not as clean and I determine the "session" but finding users and their activity in through ranking and I have it broken down into 30 minute intervals with a confidence flag and relate from that.
I'll give this a shot though, thank you again!
October 29, 2016 at 9:59 am
I just realised that you are using SQL 2008, so you will need to look at the old solution that only uses ROW_NUMBER()
This blog post[/url] explains the solution.
Something like this
--CREATE UNIQUE INDEX idx_start ON HELP(customer, startdatetime, rownumber);
--CREATE UNIQUE INDEX idx_end ON HELP(customer, enddatetime, rownumber);
WITH C1 AS (
SELECT rownumber, customer, [event], startdatetime, enddatetime,
startdatetime AS ts, +1 AS type,
ROW_NUMBER() OVER(PARTITION BY customer ORDER BY startdatetime, rownumber) AS s,
NULL AS e
FROM HELP
UNION ALL
SELECT rownumber, customer, [event], startdatetime, enddatetime,
enddatetime AS ts, -1 AS type,
NULL AS s,
ROW_NUMBER() OVER(PARTITION BY customer ORDER BY enddatetime, rownumber) AS e
FROM HELP
)
, C2 AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY customer ORDER BY ts, type DESC, rownumber) AS se
FROM C1
)
, C3 AS (
SELECT rownumber, customer, [event], startdatetime, enddatetime,
FLOOR((ROW_NUMBER() OVER(PARTITION BY customer ORDER BY ts) + 1) / 2) AS [session]
FROM C2
CROSS APPLY ( VALUES(s - (se - s) - 1, (se - e) - e) ) AS A(cs, ce)
WHERE cs = 0 OR ce = 0
)
SELECT rownumber, customer, [session]=MIN([session]), [event], startdatetime, enddatetime
FROM C3
GROUP BY rownumber, customer, [event], startdatetime, enddatetime;
October 29, 2016 at 12:32 pm
Thank you very much for the redirect, i'm going to try to apply this tomorrow but i will post the result, this certainly looks like where i can to go though!
Thank you again!
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply