I am attempting to create an episode marker when duration between dates is >= 90 days, filtered by EMID. Below is the query I ran, with actual vs expected output in the attached file. There are 2 issues: 1- the DATEDIFF function is supposed to restart with each new EMID (which is the point of using partition)- currently this is not the case. 2- The query is not recognizing 90 day periods within the same EMID, only the transition from one EMID to another. Hopefully the actual vs expected table will clarify these issues.
SELECT TOP (1000) [EMID],[visit_date_01],
CASE
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (partition by EMID order by EMID)) <= 90 THEN '1'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (PARTITION BY EMID ORDER BY EMID)) BETWEEN 90 AND 179 THEN '2'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (PARTITION BY EMID order by EMID)) > 180 THEN '3'
END AS EPISODE
FROM [res_treatment_escalation].[dbo].['c-spine_full_dataset_4#29#20_wi$']
May 4, 2020 at 3:12 pm
based on your expected I would suggest you change the default value from getdate() to visit_date_01
and add visit_date_01 to the order by (instead of EMID as that is already on the partition)
and your expected results for " BA10031 2018-08-01" - I think it should be 3, not 2 as you mention
May 4, 2020 at 3:26 pm
Thank you for your reply. I made the recommended changes, and below is the new query- the result is that every row in the Episode column is 1
SELECT TOP (1000) [EMID],[visit_date_01],
CASE
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,visit_date_01) OVER (partition by EMID order by visit_date_01)) <= 90 THEN '1'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,visit_date_01) OVER (PARTITION BY EMID ORDER BY visit_date_01)) BETWEEN 90 AND 179 THEN '2'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,visit_date_01) OVER (PARTITION BY EMID order by visit_date_01)) > 180 THEN '3'
END AS EPISODE
FROM [res_treatment_escalation].[dbo].['c-spine_full_dataset_4#29#20_wi$']
May 4, 2020 at 3:49 pm
add these 2 lines to your code so you can see what you are doing wrong - and then you should know how to fix it
,LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01)
,DATEDIFF(DAY, visit_date_01, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01))
May 4, 2020 at 4:00 pm
You have your dates switched in the DATEDIFF function. It should be DATEDIFF(DAY, LAG(...), visit_date_01)
. Or you should be using LEAD()
instead of LAG()
.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 4, 2020 at 5:42 pm
your expected output does not match the code you supplied - so please be very very clear on exactly what are the rules for what you intend in getting as we can't read minds (at least that far from us!!)
in any case I think that you will need to do this slightly differently as you have multiple records for same EMID and date which messes up any lag/lead window (at least according to what you posted as desired for the second example)
so can you pick up the following example and state what you need as output for each line with a clear explanation of why the value should be what you expect
;with base as
(select *
from (values
('BA10010', convert(date, '2014-04-01'))
,('BA10010', convert(date, '2015-01-01'))
,('BA10010', convert(date, '2015-05-01'))
,('BA10010', convert(date, '2016-11-01'))
,('BA10010', convert(date, '2016-11-01'))
,('BA10010', convert(date, '2016-11-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-06-01'))
,('BA10010', convert(date, '2017-07-01'))
,('BA10010', convert(date, '2017-07-01'))
,('BA10010', convert(date, '2017-07-01'))
,('BA10010', convert(date, '2017-07-01'))
,('BA10010', convert(date, '2017-07-01'))
) t(EMID, visit_date_01)
)
select b1.EMID
, b1.visit_date_01
, days_last_visit
, previous_visit
, CASE
WHEN days_last_visit <= 90 THEN '1'
WHEN days_last_visit BETWEEN 90 AND 179 THEN '2'
WHEN days_last_visit > 180 THEN '3'
END AS EPISODE
from base b1
inner join (select EMID
, visit_date_01
, datediff(day, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01), visit_date_01) as days_last_visit
-- next line only to show what date was found
, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01) as previous_visit
from (select distinct EMID, visit_date_01
from base
) t
) dts
on dts.emid = b1.emid
and dts.visit_date_01 = b1.visit_date_01
order by b1.emid, b1.visit_date_01
May 5, 2020 at 3:59 pm
Thank you Frederico- this is probably a very basic question, but how to I include the table name in the query you supplied so it knows where to pull data from? When I try to add this at the end I get an error "Incorrect syntax near the keyword 'FROM' "
The table name is
FROM [res_treatment_escalation].[dbo].['c-spine_full_dataset_4#29#20_wi$']
May 5, 2020 at 4:05 pm
you could answer my question without using your data as I supplied sample data on the CTE (you need to learn about these)
to use your real table you use the following query replacing the "from base" with "from your tablename"
select b1.EMID
, b1.visit_date_01
, days_last_visit
, previous_visit
, CASE
WHEN days_last_visit <= 90 THEN '1'
WHEN days_last_visit BETWEEN 90 AND 179 THEN '2'
WHEN days_last_visit > 180 THEN '3'
END AS EPISODE
from base b1
inner join (select EMID
, visit_date_01
, datediff(day, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01), visit_date_01) as days_last_visit
-- next line only to show what date was found
, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01) as previous_visit
from (select distinct EMID, visit_date_01
from base
) t
) dts
on dts.emid = b1.emid
and dts.visit_date_01 = b1.visit_date_01
order by b1.emid, b1.visit_date_01
May 5, 2020 at 4:42 pm
Thank you so much! I was able to run the query- however, the output is still not exactly what I am looking for. I have attached actual vs expected- what I am looking for exactly is for the episode number to begin at 1 for each unique EMID, continue as 1 until a 90 day period between dates is found (or a new EMID occurs), and increment by 1. Hopefully this is clear, and I really appreciate your help. I am a total beginner and trouble shooting this on my own takes days.
May 5, 2020 at 6:40 pm
that is clearer but still could you explain why
01/07/2017 had an increment when difference between previous date is only 30 days
01/08/2017 did not have an increment when difference between previous date is 31 days
and if we were to add the following dates what would you expect and why - sorry if I insist but it sill needs clarification
01/09/2017 (diff 31
01/10/2017 (diff 30
01/11/2017 (diff 31
01/12/2017 (diff 30
EMIDvisit_date_01days_last_visitprevious_visitEPISODE
BA1001001/04/2014001/04/20141
BA1001001/01/201527501/04/20142
BA1001001/05/201512001/01/20153
BA1001001/11/201655001/05/20154
BA1001001/06/201721201/11/20165
BA1001001/07/20173001/06/20176
BA1001001/08/20173101/07/20176
May 6, 2020 at 3:34 am
Hi Frederico- sorry, this was my mistake. There should be no increment on 1/07/2017 or 1/08/2017 (fixed below). The same with the other dates you asked about. An increment only occurs when >=90 days occur between dates within the same EMID
EMID visit_date_01 days_last_visit previous_visit EPISODE
BA10010 01/04/2014 0 01/04/2014 1
BA10010 01/01/2015 275 01/04/2014 2
BA10010 01/05/2015 120 01/01/2015 3
BA10010 01/11/2016 550 01/05/2015 4
BA10010 01/06/2017 212 01/11/2016 5
BA10010 01/07/2017 30 01/06/2017 5
BA10010 01/08/2017 31 01/07/2017 5
thanks for the clarification.
this should then do the trick - as before replace the "base" table with your own table
select b1.EMID
, b1.visit_date_01
, dts.episode
, dts.previous_visit
, dts.daysdiff
, dts.rankx
from base b1
inner join (select t2.*
, sum(rankx) over (PARTITION BY EMID order by visit_date_01)
as episode
from (select EMID
, visit_date_01
, case
when LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01) = visit_date_01
then 1
when datediff(day, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01), visit_date_01) >= 90
then 1
else 0
end as rankx
-- next lines only to show what values were considered was found can be removed
, datediff(day, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01), visit_date_01) as daysdiff
, LAG(visit_date_01, 1, visit_date_01) OVER (PARTITION BY EMID order by visit_date_01) as previous_visit
from (select distinct EMID, visit_date_01
from base
) t
) t2
) dts
on dts.emid = b1.emid
and dts.visit_date_01 = b1.visit_date_01
order by b1.emid, b1.visit_date_01
May 6, 2020 at 4:05 pm
Thats it! Amazing work, and thank you so much. I am forever in your debt.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply