February 6, 2017 at 8:08 am
Hello,
I am having issues with this query. I have a table which has a UserID and a datetime stamp. The UserId is the IP of the each user, and the timestamp is an actual hit on the reports. So I need a query that shows all users who have been active for 4 hours straight in a single day. We are trying to find BOTs in the system. Please see the example below. I could send what I already have, but it's not giving me the correct result set. Datetime is actually just 1 column. So we are focused on just 2 columns, UserID and Datestamp. Any help would be greatly appreciated.
USERID | DATE | TIME |
123456 | 1/1/2016 | 11:01:00 AM |
11:05:00 AM | ||
11:09:00 AM | ||
11:10:00 AM | ||
11:15:00 AM | ||
11:21:00 AM | ||
11:30:00 AM | ||
12:01:00 AM | ||
12:05:00 AM | ||
12:09:00 AM | ||
12:10:00 AM | ||
12:15:00 AM | ||
1:21:00 AM | ||
1:30:00 AM |
s://www��UTp
The are no problems, only solutions. --John Lennon
February 6, 2017 at 8:15 am
You've been here long enough to know this ...
Please provide DDL, sample data (in the form of INSERT statements) and expected results (based on the sample data you have provided.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 6, 2017 at 8:39 am
Phil Parkin - Monday, February 6, 2017 8:15 AMYou've been here long enough to know this ...Please provide DDL, sample data (in the form of INSERT statements) and expected results (based on the sample data you have provided.
The DDL is really important here. You say you have a datetime, but you're showing separate date and time columns, where the time column has less precision than a datetime data type.
In addition, what lapse in time do you consider continuous? If the time from one hit to another is 4 minutes, is that continuous? How about 5 minutes? Where do you draw the line?
February 6, 2017 at 8:53 am
Sorry guys, didn't paste the query. The table simply has 2 columns... UID and timestamp (apologies for the crude diagram).
This is the general idea. I am hard coding the hours here but the idea is to show how many users were online for a full day and active for more than 4 hours. I know there is a more eloquent way of producing this, but I am having trouble.
Thank you all for your time and efforts.
Select top 100 UID, timestamp from TABLE
where timestamp >= '02-01-2016' and timestamp < '02-02-2016'
and datepart(hour,timestamp ) >=1 and datepart(hour,timestamp ) < 5
The are no problems, only solutions. --John Lennon
February 6, 2017 at 8:59 am
Lord Slaagh - Monday, February 6, 2017 8:53 AMSorry guys, didn't paste the query. The table simply has 2 columns... UID and timestamp (apologies for the crude diagram).This is the general idea. I am hard coding the hours here but the idea is to show how many users were online for a full day and active for more than 4 hours. I know there is a more eloquent way of producing this, but I am having trouble.
Thank you all for your time and efforts.
Select top 100 UID, timestamp from TABLE
where timestamp >= '02-01-2016' and timestamp < '02-02-2016'
and datepart(hour,timestamp ) >=1 and datepart(hour,timestamp ) < 5
Sorry, but this is not adequate.
Please see the link in my signature which describes how to post your sample DDL and data.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 6, 2017 at 10:05 am
I don't understand Phil. I posted the query that I have been working on. I am not sure what else I can say. I was able to get the query between >1:00 and < 5:00 on the same day. Basically the query should be every user that is on for 24 hours and has been active for longer that 4 hours.
If you guys can't help then its fine. I will do some more research on Google. I am really struggling with this query, and as a consequence my work is building up. Sorry this isn't clear enough for you guys.
Thank for your responses nonetheless.
The are no problems, only solutions. --John Lennon
February 6, 2017 at 10:35 am
Lord Slaagh - Monday, February 6, 2017 10:05 AMI don't understand Phil. I posted the query that I have been working on. I am not sure what else I can say. I was able to get the query between >1:00 and < 5:00 on the same day. Basically the query should be every user that is on for 24 hours and has been active for longer that 4 hours.If you guys can't help then its fine. I will do some more research on Google. I am really struggling with this query, and as a consequence my work is building up. Sorry this isn't clear enough for you guys.
Thank for your responses nonetheless.
See this post, for example.
It includes T-SQL which can be pasted into SSMS and executed. And it includes desired results, based on the test data provided.
If you are prepared to make the effort to do something similar with your data and your problem, you will almost certainly get a working, tested solution in return.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 6, 2017 at 10:36 am
Lord Slaagh - Monday, February 6, 2017 10:05 AMI don't understand Phil. I posted the query that I have been working on. I am not sure what else I can say. I was able to get the query between >1:00 and < 5:00 on the same day. Basically the query should be every user that is on for 24 hours and has been active for longer that 4 hours.If you guys can't help then its fine. I will do some more research on Google. I am really struggling with this query, and as a consequence my work is building up. Sorry this isn't clear enough for you guys.
Thank for your responses nonetheless.
What Phil is asking for is DDL to create the scenario you're facing right now. We can't see what you see and we don't have your table structure or data.
The data type questions I had earlier would be answered with DDL. There's a link in my signature (as well as Phil's) on what we're referring to. We basically need to be able to recreate the problem in order to solve it.
February 6, 2017 at 1:13 pm
I am not sure how I can replicate data that might not even exist. That is the challenge with this. I am trying to find users that meet this criteria, so if I am struggling finding the data, I am not sure how I can reproduce it.
Does anyone know how I can search for intervals in a time-stamp that are <= 4 hours? The logic would be fine. I can probably figure it out from there.
I understand why you guys ask for data, but again, I am not sure how to reproduce this,when I am looking for these users.
The are no problems, only solutions. --John Lennon
February 6, 2017 at 1:27 pm
I might be able to do something for you when I get to the office tomorrow with our own logs table again, like in the other post, but without DDL then I have no idea what to write for you (so I'll just be giving you the query I ran my end and you'd have to amend it).
What is your definition of "inactive"? The user doesn't use the system for 10 minutes, 20 minutes, 60?
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
February 6, 2017 at 1:30 pm
Lord Slaagh - Monday, February 6, 2017 1:13 PMI am not sure how I can replicate data that might not even exist. That is the challenge with this. I am trying to find users that meet this criteria, so if I am struggling finding the data, I am not sure how I can reproduce it.Does anyone know how I can search for intervals in a time-stamp that are <= 4 hours? The logic would be fine. I can probably figure it out from there.
I understand why you guys ask for data, but again, I am not sure how to reproduce this,when I am looking for these users.
Your data does not need to be replicated, feel free to make it up.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 6, 2017 at 2:53 pm
Lord Slaagh - Monday, February 6, 2017 1:13 PMI am not sure how I can replicate data that might not even exist. That is the challenge with this. I am trying to find users that meet this criteria, so if I am struggling finding the data, I am not sure how I can reproduce it.Does anyone know how I can search for intervals in a time-stamp that are <= 4 hours? The logic would be fine. I can probably figure it out from there.
I understand why you guys ask for data, but again, I am not sure how to reproduce this,when I am looking for these users.
You're looking for LAG/LEAD.
;
WITH CTE AS
(
SELECT TOP 100 UID, timestamp, LAG(timestamp) OVER(PARTITION BY UID ORDER BY timestamp) AS prev_timestamp
FROM TABLE
)
SELECT *
FROM CTE
WHERE timestamp <= DATEADD(HOUR, 4, prev_timestamp)
/* I prefer to use DATEADD over DATEDIFF here, because I don't want to count
11:59:59 as being an hour away from 12:00:00
*/
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 7, 2017 at 7:23 am
Ok, so just in case anyone has the same issue, I came up with a query that shows the number of clicks per minute.
What my associate was asking for, was a bit convoluted. I understand why this question made you guys say... "What!"
Thanks for all you help guys! Thanks again for your help Phil.
select count(*) as hits, left(convert(varchar, datetime,121), 16) as minte, uid
from TABLE
where datetime> '2016-02-01'
and datetime<= '2016-02-02'
and uid <> '-'
group by uid, left(convert(varchar, datetime,121), 16) --minute
order by 1 desc
The are no problems, only solutions. --John Lennon
February 7, 2017 at 11:39 am
For anyone following this conversation, here's the next installment of the question: https://www.sqlservercentral.com/Forums/FindPost1856856.aspx
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply