May 30, 2013 at 9:02 am
Reran my code and in looking at the results from MM code I have to ask why the last row is being counted when it overlaps time wise with the previous record that is being excluded?
MM's results:
pmd_patient_idendcallcdeventidcreatedat
100128137100199999952013-04-16 12:02:14.000
100128137100199999962013-04-16 17:02:14.000
100128138100150936342013-04-16 17:17:14.000
100128138100150992682013-04-26 13:48:53.000
100128138100251211752013-05-16 16:43:51.000
100128138100251332812013-05-23 17:53:46.000
100128138100251332812013-05-23 19:03:46.000 << This row
Sample data:
pmd_patient_idendcallcdeventidcreatedat
100128137100199999952013-04-16 12:02:14.000
100128137100199999962013-04-16 17:02:14.000
100128137100199999972013-04-16 17:12:14.000
100128138100150936342013-04-16 17:17:14.000
100128138100150992682013-04-26 13:48:53.000
100128138100251211752013-05-16 16:43:51.000
100128138100251332812013-05-23 17:53:46.000
100128138100251332812013-05-23 18:43:46.000
100128138100251332812013-05-23 19:03:46.000 << Overlaps with the previous row
May 30, 2013 at 9:08 am
Okay, rereading everything. The first record that is counted starts the 60 minute window, subsequent records that are overlapping do not extend the window.
This correct?
May 30, 2013 at 9:52 am
Lynn Pettis (5/30/2013)
Okay, rereading everything. The first record that is counted starts the 60 minute window, subsequent records that are overlapping do not extend the window.This correct?
100128138100251211752013-05-16 16:43:51.000
100128138100251332812013-05-23 17:53:46.000
100128138100251332812013-05-23 18:43:46.000
100128138100251332812013-05-23 19:03:46.000
For the above four rows the count should be 3.
Even so row #4 is less than 60 min apart of row #3, it should be counted as it's more than 60 min apart since previous "counted" row #2.
I guess, it's possible to do it with triangular joins, but I have no doubt that it will perform worse than "quirky update" provided. It may be slightly better than using cursor/loop.
I'm thinking towards using data-island method, but cannot get my head around it as yet.
May 30, 2013 at 10:20 am
thank you is great artitial
May 30, 2013 at 6:20 pm
Eugene, thanks for posting that QU solution.
I had always thought of QU as being a "Rolling Total" solution, and this has opened my mind a bit, so thanks again.
Of course, the QU version will outperform anything else in here, without a doubt.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 30, 2013 at 6:57 pm
I'm overwhelmed with gratitude for the multiple solutions posed! I thought I'd give a little more background on the business requirements to try to illustrate what we're doing. I'm a SQL back-end developer and our company has a web-based UI written in Ruby on Rails by overseas developers. We have no control over the UI or ROR pieces. There is a bug in the system that is generating multiple events in our SQL table that may or may not be "real." Sometimes, depending on the internet browser, it could be caused by double-clicking or clicking on some completely unrelated area of the screen. Then, I've been told there are "caching" issues. They haven't explained what "cache" they're referring to...local, browser, server, etc... In other words, we have no idea. It was decided to only count events that occurred more than 60 minutes apart. That's all I was told, so I'm trying to come up with the logic on my own. I think that both the QU and the loop code will work. Performance-wise, we'll be running the code at night on non-peak time, so as long as it finishes (and it doesn't appear to be an issue on our small datasets), that meets the requirement at this time. My hope is that the web GUI developers will straighten out their side soon!
Thank you again so much! I'm going to run all three sets of code and compare them all.
June 3, 2013 at 7:31 pm
An alternative solution, that uses a recursive CTE instead of a WHILE loop:
--DECLARE @pmd_patient_id bigint = 100128137 --for single patient filtering
;WITH
lteWithChronSeq AS ( --add a per patient chronological sequence number to raw data
SELECT *
, ROW_NUMBER() OVER(PARTITION BY pmd_patient_id ORDER BY createdat) AS ChronSeq
FROM la_test_endcall
)
--SELECT * FROM lteWithChronSeq
, ltewcsCompare AS ( --match every event with every other event
SELECT
ltewcs1.pmd_patient_id
, ltewcs1.ChronSeq AS ChronSeq1
, ltewcs2.ChronSeq AS ChronSeq2
, CASE WHEN DATEDIFF(s,ltewcs2.createdat,ltewcs1.createdat) > 3600 THEN 1 ELSE 0 END AS IsGood --when time difference > 3600 secs (60 mins) then event is good
FROM lteWithChronSeq ltewcs1
CROSS JOIN lteWithChronSeq ltewcs2
WHERE ltewcs1.pmd_patient_id = ltewcs2.pmd_patient_id --return only events for same patients
AND ltewcs1.ChronSeq > ltewcs2.ChronSeq --only compare events with ones that occured earlier
--AND ltewcs1.pmd_patient_id = @pmd_patient_id --single patient filter
)
--SELECT * FROM ltewcsCompare
, AllGoodEventPairs AS ( --first subsequent good event for any given event
SELECT
pmd_patient_id
, ChronSeq2 AS ChronSeq
, MIN(ChronSeq1) AS NextGoodChronSeq
FROM ltewcsCompare
WHERE IsGood = 1
GROUP BY pmd_patient_id, ChronSeq2
)
--SELECT * FROM AllGoodEventPairs
, GoodEvents AS ( --create a 'chain' of good events
--anchor
SELECT pmd_patient_id, ChronSeq
FROM AllGoodEventPairs
WHERE ChronSeq = 1 --first event is always good
UNION ALL
--recursive
SELECT ge.pmd_patient_id, NextGoodChronSeq
FROM GoodEvents ge
INNER JOIN AllGoodEventPairs agep ON ge.ChronSeq = agep.ChronSeq AND ge.pmd_patient_id = agep.pmd_patient_id
)
--finally, join the chain of good events back to the original data:
SELECT
ltewcs.pmd_patient_id
, ltewcs.endcallcd
, ltewcs.eventid
, ltewcs.createdat
FROM GoodEvents ge
INNER JOIN lteWithChronSeq ltewcs ON ltewcs.ChronSeq = ge.ChronSeq AND ltewcs.pmd_patient_id = ge.pmd_patient_id
ORDER BY ge.pmd_patient_id, ge.ChronSeq
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply