sql query

Question

sql query

TAman

SSChasing Mays

Points: 636
More actions
July 17, 2009 at 2:50 pm

#223699

I need to create a report to show a list of employees that have made an entry on a file more than once within any 60 minute window. My table contains following data.
Emp_ID Start_Entry End_Entry
12345 2009-05-04 02:31:00 2009-05-04 02:32:00
12345 2009-05-04 02:37:00 2009-05-04 02:38:00
45321 2009-05-04 02:30:00 2009-05-04 02:33:00
78989 2009-05-04 02:37:00 2009-05-04 02:39:00
78989 2009-05-04 02:41:00 2009-05-04 02:44:00
78989 2009-05-04 02:49:00 2009-05-04 02:50:00
Appreciate the help!

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

TAman (7/17/2009)
I need to create a report to show a list of employees that have made an entry on a file more than once within any 60 minute window. My table contains following data.
Emp_ID Start_Entry End_Entry
12345 2009-05-04 02:31:00 2009-05-04 02:32:00
12345 2009-05-04 02:37:00 2009-05-04 02:38:00
45321 2009-05-04 02:30:00 2009-05-04 02:33:00
78989 2009-05-04 02:37:00 2009-05-04 02:39:00
78989 2009-05-04 02:41:00 2009-05-04 02:44:00
78989 2009-05-04 02:49:00 2009-05-04 02:50:00
Appreciate the help!

Based on what? The start_Entry or the End_Entry?

Also, take the time to read the article at the first link in my signature below... if you follow that, you'll be amazed at how much quicker you get an answer especially on a Friday evening 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TAman SSChasing Mays Points: 636 More actions · Answer 2

TAman

SSChasing Mays

Points: 636

July 20, 2009 at 6:17 am

#1026877

It will be based on Start_Entry. thx

ss-457805 SSCertifiable Points: 5872 More actions · Answer 3

something on the lines of this should work:

select dateadd(hh, datediff(hh, 0, start_entry), 0) as starthour, dateadd(hh, datediff(hh, 0, start_entry) + 1, 0) as endhour,emp_id

from #tempsql

group by dateadd(hh, datediff(hh, 0, start_entry), 0), dateadd(hh, datediff(hh, 0, start_entry) + 1, 0), emp_id

having count(emp_id)>1

blog: http://sarveshsingh.com Twitter: @sarveshsing

TAman SSChasing Mays Points: 636 More actions · Answer 4

This is only working if the HH(hour) starts exactly at the top of the hour and ends at the top of the next hour. (i.e. if it starts at 12:00:00 and ends at 01:00:00) that was within one hour exactly.

I need to take into consideration that the time may be based on mm(minutes) within the hour. (i.e. If it starts at 12:09:00 and ends at 1:10:00) that is not within one hour exactly.

Any help is appreciated! Thanks!

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 5

Two things. First, what have you written so far to solve this problem. Second, based on the data in your original post, what whould be the expected output.

A little hard to write any code without knowing what is expected, and your word based description leaves quite a lot to one's imagination.

TAman SSChasing Mays Points: 636 More actions · Answer 6

This is what I've created so far to resolve the issue. It's not giving me exactly what I want, but it is what I've created so far.

SELECT Emp_ID, Start_Entry as Date, COUNT(*) AS Num_Occurances_in_60min

FROM DBO.TBL_LOG AS C

WHERE Start_Entry Between DateAdd(Day,-30,'7/22/09') AND '07/22/09'

AND EXISTS(SELECT Start_Entry, Emp_ID

FROM DBO.TBL_LOG AS T

WHERE T.File_Name= C.File_Name

AND ABS(DATEDIFF(mi,T.Start_Entry,C.Start_Entry)) > '60')

There has been a column that I needed from the original data that I didn't add in my orignal post. Below is the data in my table. I need to get (Employee who logged in and made an entry, more than one time, within 60 minutes ,on the same file name. So, I have the below data in my table.

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

45321 2009-05-04 02:30:00 2009-05-04 02:33:00 33

78989 2009-05-04 02:37:00 2009-05-04 02:39:00 2

78989 2009-05-04 02:41:00 2009-05-04 02:44:00 2

78989 2009-05-04 02:49:00 2009-05-04 02:50:00 2

My expected output based on the data in my table would be:

Emp_ID Date Num_Occurances_in_60min

12345 2009-05-04 2

78989 2009-05-04 3

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

One final question... would this be 2 or 3 occurances? If 2, why and which 2 would they be?

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

12345 2009-05-04 03:36:00 2009-05-04 03:37:00 1

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TAman SSChasing Mays Points: 636 More actions · Answer 8

The result should be 2.

Which ones?: See below

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

Why? Because the 3rd occurance started at 3:36 which is 1 hour and 5 min from the start entry 2:31. The 3rd occurance starts 65 minutes from the start entry which is out of range. The result needed is within 60 minute range.

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

12345 2009-05-04 03:36:00 2009-05-04 03:37:00 1

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 9

TAman (7/23/2009)
The result should be 2.
Which ones?: See below

Emp_ID Start_Entry End_Entry File_Name
12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1
12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1
Why? Because the 3rd occurance started at 3:36 which is 1 hour and 5 min from the start entry 2:31. The 3rd occurance starts 65 minutes from the start entry which is out of range. The result needed is within 60 minute range.
Emp_ID Start_Entry End_Entry File_Name
12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1
12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1
12345 2009-05-04 03:36:00 2009-05-04 03:37:00 1

So, to take this further, in the instance above with 3 entries for one emplyee, would we return 1 row for the day reporting 2 occurances, or 2 rows reporting 2 occurances in one row and 1 occurance in the second?

TAman SSChasing Mays Points: 636 More actions · Answer 10

1 row for the day showing 2 occurances(something like the example below). I do not need to see any results with 1 occurance in 60min range.

Emp_ID Date Num_Occurances_in_60min

12345 2009-05-04 2

Additionally, If we could include the file_name on the row, then it would show me what file the employee made the entry on. Something like this...

Emp_ID Date Num_Occurances_in_60min File_name

12345 2009-05-04 2 1

Thanks!

sgmunson SSC Guru Points: 110651 More actions · Answer 11

There's a logic problem here... For the data shown here:

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

12345 2009-05-04 03:36:00 2009-05-04 03:37:00 1

You seem to be saying that's only 2 occurrences, but clearly, that 3rd entry is within 60 minutes of the 2nd entry, so if I took your original words at face value, I have two occurrences of 2 entries on the same file, with each pair of occurrences being within 60 minutes of it's "pair-mate". Please realize that you need to be particularly anal about specifications for something like this, and cover EVERY possible scenario, or you run the risk of bogus results with the significant potential for being unable to detect that you have a problem.

Please provide a detailed explanation of EXACTLY what constitutes 2 entries against the same filename by the same employee within any given 60 minute window.

Steve

(aka smunson)

:-):-):-)

One of the problems

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

smunson (7/23/2009)
There's a logic problem here...
...but clearly, that 3rd entry is within 60 minutes of the 2nd entry

Heh... Glad I'm not the only one... That's exactly the point I was trying to make. It's not as straight forward as it's described. To me, it should return two rows but not for the reasons stated. It should return the first row because the second row is within 60 minutes. Then, it should return the second row because the 3rd rows is within 60 minutes. Still others might say this should be a set of 3.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110651 More actions · Answer 13

Thanks Jeff... I was wondering if I had just missed something that everyone else was assuming. And to add oil and water to the fire, if you will, what I mentioned is only the beginning of some potential issues. If I recall the original words, I thought I heard something about "in the same day", which would introduce boundary issues if the data were grouped by date. After all, any given 60-minute window could be one that starts at 11:30pm and goes until 12:30am the next morning, so grouping by date would be a problem. Also, how do you decide, within a large batch of entries for a given user, which of the records to compare to other records? For performance reasons, it would be desirable to avoid comparing any record pair that was > 24 hours apart on the start date, but you have to get SQL Server to figure out how to avoid that, and with any sizable number of records, I'm guessing that this kind of thing is going to get ugly in a hurry.

I guess I'd start by eliminating any records for filenames that don't appear in the overall recordset more than once for the same user, and perhaps with that filter in place, it keeps the numbers down to a reasonable minimum?

Steve

(aka smunson)

:-):-):-)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

TAman SSChasing Mays Points: 636 More actions · Answer 14

I see exactly what you are saying. You are correct, and it should read '3' entries. I didn't take into consideration of the 3rd entry within 60 minutes of the 2nd entry. So with that said, the number_of_occurances should be a result of 3.

The result on my previous post was not accurate. The result I need should be like this:

Emp_ID Date Num_Occurances_in_60min File_name

12345 2009-05-04 3 1

Detail:::: If the file name is the same as 'any' other file name, AND it's match has the same Emp ID, AND 'any' start entries (date/time) of those matching file name(s) are within 60 minutes of 'any' of those start entries then show result.

Can anyone send me code that would get me my expected result above based on the data below? Thx

Emp_ID Start_Entry End_Entry File_Name

12345 2009-05-04 02:31:00 2009-05-04 02:32:00 1

12345 2009-05-04 02:37:00 2009-05-04 02:38:00 1

12345 2009-05-04 03:36:00 2009-05-04 03:37:00 1

45321 2009-05-04 02:30:00 2009-05-04 02:33:00 33

78989 2009-05-04 02:37:00 2009-05-04 02:39:00 2