November 20, 2008 at 2:01 am
Hi All
I have a set of blood pressure measurements like:
MMID MESDATE SYS DIA
A006283 2005-11-14 148 80
A006283 2006-01-16 130 88
A006283 2006-10-18 130 80
A006283 2006-12-28 144 96
A006283 2007-01-03 120 80
A006283 2006-05-17 130 80
A006283 2007-02-28 140 80
A006283 2007-05-03 130 80
A006283 2008-01-18 150 80
A006283 2006-06-29 130 70
...
M009781 2006-10-24 110 70
M015182 2008-07-22 130 90
M020100 2006-04-20 130 70
I want to identify sequences where SYS>=130 and DIA>=80 for all the different patients. So for patient A006283 for example, I should get:
1) A006283 2005-11-14 148 80 2006-01-16 130 88 2006-05-17 130 80
2) A006283 2007-02-28 140 80 2007-05-03 130 80 2008-01-18 120 80
cos these are the only consecutive measurements that meet the criteria (note that the raw data was not sorted!).
I'm thinking using a CROSS APPLY will get part of the required results, but I think you need to PIVOT it as well to display the multiple rows (ie. the 3 consecutives) in a single row like above.
Ideally I'm after a generic method (UDF?) where I can run the query for 'n' consecutives.
Thanks
November 20, 2008 at 9:11 am
My first thoughts are either ROW_NUMBER, or Tally table
I'd say provide a sample data set (in SQL code) so that we can assist you better
To clarify though, shouldn't results be more than just 2? for SYS>=130 and DIA>=80
Unless you only want the first and last?
MMID MESDATE SYS DIA
A006283 2005-11-14 148 80
A006283 2006-01-16 130 88
A006283 2006-10-18 130 80
A006283 2006-12-28 144 96
A006283 2007-01-03 120 80
A006283 2006-05-17 130 80
A006283 2007-02-28 140 80
A006283 2007-05-03 130 80
A006283 2008-01-18 150 80
A006283 2006-06-29 130 70
November 20, 2008 at 1:19 pm
Thanks for the response Jerry. I'll try to clarify the problem further with a slightly different data set:
Here's some raw data:
DECLARE @BPTable TABLE (
MMID varchar(10),
MESDATE date,
SYSTOLIC int,
DIA int
)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2005-11-14', 148, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-01-16', 130, 88)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-10-18', 120, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-12-28', 144, 96)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-01-03', 120, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-05-17', 130, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-02-28', 140, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-05-03', 130, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2008-01-18', 150, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-06-29', 130, 90)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M009781', '2006-10-24', 110, 70)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M015182', '2008-07-22', 130, 90)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M020100', '2006-04-20', 130, 70)
Note how the raw data is not ordered in anyway, so if you order by id and date, using a simple query:
select *
from @BPTable
order by MMID, MESDATE
then you'll have the following ordered dataset:
A0062832005-11-14 148 80
A0062832006-01-16 130 88
A0062832006-05-17 130 80
A0062832006-06-29 130 90
A0062832006-10-18 120 80
A0062832006-12-28 144 96
A0062832007-01-03 120 80
A0062832007-02-28 140 80
A0062832007-05-03 130 80
A0062832008-01-18 150 80
M0097812006-10-24 110 70
M0151822008-07-22 130 90
M0201002006-04-20 130 70
Now if you look at 3 consecutive occurances, it's the following:
1.
A0062832005-11-1414880
A0062832006-01-1613088
A0062832006-05-1713080
2.
A0062832006-01-1613088
A0062832006-05-1713080
A0062832006-06-2913090
3.
A0062832007-02-2814080
A0062832007-05-0313080
A0062832008-01-1815080
Hope my problem is clearer now 🙂
Thanks
November 20, 2008 at 3:33 pm
Since there's a missing row in the middle you didn't show, so you are looking for any X-consecutive rows right?
so in this case, only rows that fit the criteria would count?
A006283 2006-12-28 144 96
November 20, 2008 at 3:39 pm
Oh..sorry for the confusion. It wasn't really a missing row as such..but just a blank line I added to show the end of that particular patient who had a bunch of BP measurements.
Yes, I'm looking for 'x' consecutive rows (after being ordered by date though) that meet the required criteria. So if you run the magic-query on the test data I've posted, you should get 1,2,3 outputs I've shown above as the results of that query.
November 21, 2008 at 7:58 am
I am sure there's a better way, and less #temp tables involved
but here's my first try. Idea is to give Identity to each row, and find the streak, and select based on the streak
DECLARE @BPTable TABLE (
MMID VARCHAR(10),
MESDATE DATE,
SYSTOLIC INT,
DIA INT
)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2005-11-14', 148, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-01-16', 130, 88)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-10-18', 120, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-12-28', 144, 96)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-01-03', 120, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-05-17', 130, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-02-28', 140, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-05-03', 130, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2008-01-18', 150, 80)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-06-29', 130, 90)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M009781', '2006-10-24', 110, 70)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M015182', '2008-07-22', 130, 90)
INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M020100', '2006-04-20', 130, 70)
SELECT *
,Qualify=CASE WHEN SYSTOLIC>=130 AND DIA>=80 THEN 1 ELSE 0 END
,id=ROW_NUMBER() OVER (ORDER BY MMID, MESDATE)
INTO #test
FROM @BPTable
ORDER BY MMID, MESDATE ASC
SELECT *,
RowsBeforeQualify=(
SELECT COUNT(1) FROM #test B1
WHERE B1.MMID = B2.MMID
AND B1.Qualify = 1
AND B1.id BETWEEN B2.id-2 AND B2.id
)
INTO #test1
FROM #test B2
ORDER BY 1, 2
SELECT T2.*
FROM
(SELECT id FROM #test1 WHERE RowsbeforeQualify = 3) T1-- 3rd record in a row
INNER JOIN #test1 T2
ON T2.id BETWEEN T1.id-2 AND T1.id-- only get 2 records before and itself
DROP TABLE #test
DROP TABLE #test1
Results are like this (matches your desired output, 3-row streaks at id 123, 234, 890)
MMIDMESDATESYSTOLICDIAQualifyidRowsBeforeQualify
A0062832005-11-1414880111
A0062832006-01-1613088122
A0062832006-05-1713080133
A0062832006-01-1613088122
A0062832006-05-1713080133
A0062832006-06-2913090143
A0062832007-02-2814080182
A0062832007-05-0313080192
A0062832008-01-18150801103
November 21, 2008 at 9:22 am
Here's one that I tested using SQL 2005, so I couldn't use DATE as a field type, but I'm quite sure this would work in 2008, and there are no temp tables, just a couple of CTEs:
DECLARE @BP_READINGS TABLE (
MMID varchar(10),
MESDATE DateTime,
SYSTOLIC int,
DIA int
)
INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)
SELECT 'A006283', '2005-11-14', 148, 80 UNION ALL
SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL
SELECT 'A006283', '2006-10-18', 120, 80 UNION ALL
SELECT 'A006283', '2006-12-28', 144, 96 UNION ALL
SELECT 'A006283', '2007-01-03', 120, 80 UNION ALL
SELECT 'A006283', '2006-05-17', 130, 80 UNION ALL
SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL
SELECT 'A006283', '2007-05-03', 130, 80 UNION ALL
SELECT 'A006283', '2008-01-18', 150, 80 UNION ALL
SELECT 'A006283', '2006-06-29', 130, 90 UNION ALL
SELECT 'M009781', '2006-10-24', 110, 70 UNION ALL
SELECT 'M015182', '2008-07-22', 130, 90 UNION ALL
SELECT 'M020100', '2006-04-20', 130, 70
;WITH QUALIFIED_DATA AS (
SELECT *, CASE WHEN SYSTOLIC >= 130 AND DIA >= 80 THEN 1 ELSE 0 END AS QUALIFY,
ROW_NUMBER() OVER (PARTITION BY MMID ORDER BY MESDATE) AS RN
FROM @BP_READINGS
),
QUALIFIED_ONLY AS (
SELECT *
FROM QUALIFIED_DATA
WHERE QUALIFY = 1
)
SELECT Q1.*
FROM QUALIFIED_ONLY AS Q1 INNER JOIN QUALIFIED_ONLY AS Q2
ON Q1.MMID = Q2.MMID AND
Q1.RN = Q2.RN - 1
Jerry - I was able to eliminate the extra steps by just partitioning the ROW_NUMBER function by MMID, which provides a means of just comparing for consecutive row numbers within a given MMID.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 21, 2008 at 10:01 am
Thanks, I hate using temp tables too
and I was sure I wasn't thinking straight with #temp and ROW_NUMBER
Say yes to CTE and Subqueries/Derived Tables!
November 21, 2008 at 10:57 am
My Bad !!! I've goofed :w00t::w00t::w00t:. I was just looking at Jerry's response when I realized I could make my code better by eliminating yet another step, and then in testing the revised version, discovered that the original wasn't exactly right, as it was missing out on the last item of any set of consecutive readings that qualify. Here's the fixed code:
DECLARE @BP_READINGS TABLE (
MMID varchar(10),
MESDATE DateTime,
SYSTOLIC int,
DIA int
)
INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)
SELECT 'A006283', '2005-11-14', 148, 80 UNION ALL
SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL
SELECT 'A006283', '2006-10-18', 120, 80 UNION ALL
SELECT 'A006283', '2006-12-28', 144, 96 UNION ALL
SELECT 'A006283', '2007-01-03', 120, 80 UNION ALL
SELECT 'A006283', '2006-05-17', 130, 80 UNION ALL
SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL
SELECT 'A006283', '2007-05-03', 130, 80 UNION ALL
SELECT 'A006283', '2008-01-18', 150, 80 UNION ALL
SELECT 'A006283', '2006-06-29', 130, 90 UNION ALL
SELECT 'M009781', '2006-10-24', 110, 70 UNION ALL
SELECT 'M015182', '2008-07-22', 130, 90 UNION ALL
SELECT 'M020100', '2006-04-20', 130, 70
SELECT *
FROM @BP_READINGS
ORDER BY MMID, MESDATE
;WITH QUALIFIED_ONLY AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY MMID ORDER BY MESDATE) AS RN
FROM @BP_READINGS
WHERE SYSTOLIC >= 130 AND
DIA >= 80
)
SELECT Q1.MMID, Q1.MESDATE, Q1.SYSTOLIC, Q1.DIA
FROM QUALIFIED_ONLY AS Q1
WHERE EXISTS (
SELECT MMID
FROM QUALIFIED_ONLY AS Q2
WHERE Q2.MMID = Q1.MMID AND
ABS(Q2.RN - Q1.RN) = 1
)
Please test for performance, as we have a correlated subquery here, and that can be cause for grief.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 21, 2008 at 1:54 pm
Thanks guys. Appreciate your help.
hmm..I just tried both solutions and seems like both are close, yet there're a few issues 🙁
With Jerry's solution with temp tables, it kinda seems to give the output I want, but how do you determine the BPs that belong to a particular sequence? Do you simply read out in rows in blocks of 3s in this case? I don't think you can use RowsbeforeQualify to determine it's the end of a 3-row sequence, can you? Anyway..something I really like about this solution (despite using #tables) is you can easily specify 'n' (2, 3, 4 or whatever #consecutives you need), by making a few minor changes!
With Steve's solution, the problem I see is there's no way to determine which BPs belong to which sequence. Running the query as posted with test data, you get 8-rows, which are indeed the results I need, but I don't see how you can say this is the 1st 3-row sequence, this is the 2nd 3-row sequence etc.
Thanks for all the help again!
November 21, 2008 at 2:02 pm
How would you want the outputs?
so for 3-row sets, do you want all sets shown together (like my final SELECT)
or SELECT * FROM Set1, SELECT * FROM Set2, etc.... (like your example outputs in 3 boxes)
This may involve a loop and select top 3 on each last-row (so in my sample, id 3,4, 10)
The thing is with temp tables, it's easy to convert them to CTE or derived tables, or table variables
temp tables are easier for debugging and select from
when RowsBeforeQualify=3 (or X), you are at the end of a sequence
In short, you try to find ALL SETS of 3's, find the last id, then select id-2, id-1, id
November 21, 2008 at 2:31 pm
Initially I was thinking of an output like:
ID, MESDATE1, SYS1, DIA1, MESDATE2, SYS2, DIA2, MESDATE3, SYS3, DIA3
but I'm thinking that's a bit harder to do in SQL itself. Will be a lot easier to process the results in rows of X at the front end for any display purposes. Yes?
Oh..and no, I never wanted outputs like SELECT * FROM Set1, SELECT * FROM Set2, etc. Only posted my results in different boxes to make the result sets clear.
Yes, RowsBeforeQualify=3 (or X), you are at the end of a sequence is true, but you can't use this as the criteria to say:
if RowsBeforeQualify=3 (or X) then the 3-row sequence is this row and 2-rows above it..right?
Cos in the results you get a row like:
A0062832006-05-1713080133
A0062832006-01-1613088122
A0062832006-05-1713080133
but this is not really a valid 3-row sequence, although the last row is still the end of a different 3-row sequence.
Anyway..thanks for all the help Jerry. Think I've got it sorted now 🙂
November 21, 2008 at 2:38 pm
Once you get a correct rows dataset, it's easy to PIVOT into what you want (columns)
RowsBeforeQualify is only meaningful if you also consider the id (which puts the dataset in correct order)
so if you order by id, then hit a RowsBeforeQualify=3, it's guaranteed 2 rows above are part of this sequence set (since that's how I calculated RowsBeforeQualify, using ID)
It's a start, keep hacking away, it's part of the fun
Have a great weekend
November 21, 2008 at 3:09 pm
Ummm... I'll have something for you on Monday, as I don't have time today. No biggy, but if you needed the sequences to be grouped, you should have asked about that in your first post, or did I miss something?
Steve
(aka smunson)
:):):)
thusi (11/21/2008)
Initially I was thinking of an output like:ID, MESDATE1, SYS1, DIA1, MESDATE2, SYS2, DIA2, MESDATE3, SYS3, DIA3
but I'm thinking that's a bit harder to do in SQL itself. Will be a lot easier to process the results in rows of X at the front end for any display purposes. Yes?
Oh..and no, I never wanted outputs like SELECT * FROM Set1, SELECT * FROM Set2, etc. Only posted my results in different boxes to make the result sets clear.
Yes, RowsBeforeQualify=3 (or X), you are at the end of a sequence is true, but you can't use this as the criteria to say:
if RowsBeforeQualify=3 (or X) then the 3-row sequence is this row and 2-rows above it..right?
Cos in the results you get a row like:
A0062832006-05-1713080133
A0062832006-01-1613088122
A0062832006-05-1713080133
but this is not really a valid 3-row sequence, although the last row is still the end of a different 3-row sequence.
Anyway..thanks for all the help Jerry. Think I've got it sorted now 🙂
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 21, 2008 at 4:02 pm
Sorry for not explicitly stating my grouping requirements Steve. In the initial post I did mention:
I want to identify sequences where SYS>=130 and DIA>=80 for all the different patients. So for patient A006283 for example, I should get:
1) A006283 2005-11-14 148 80 2006-01-16 130 88 2006-05-17 130 80
2) A006283 2007-02-28 140 80 2007-05-03 130 80 2008-01-18 120 80
and then in a later post I said
Now if you look at 3 consecutive occurances, it's the following:..
and listed the 3 outputs I wanted..so I was sort of implying it, but yes, you're right, I never really said grouping was a requirement as such. My apologies.
Anyway, if you're going to spend a bit of time on this problem, can I pls get your super brain working on a slightly more challenging problem, which is essentially the next step of the current issue. Sorry..these are some queries some of the docs here in NZ got..and after spending hours n hours trying to figure out how to do it, I thought I'd post my probz in a forum.
So let's start with a new dataset (I'm using an ordered set this time just to make it easier to visualise):
DECLARE @BP_READINGS TABLE (
MMID varchar(10),
MESDATE DateTime,
SYSTOLIC int,
DIA int
)
INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)
SELECT 'A006283', '2006-01-14', 148, 80 UNION ALL
SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL
SELECT 'A006283', '2006-02-25', 150, 80 UNION ALL
SELECT 'A006283', '2006-06-02', 120, 75 UNION ALL
SELECT 'A006283', '2006-08-02', 144, 96 UNION ALL
SELECT 'A006283', '2006-10-20', 140, 80 UNION ALL
SELECT 'A006283', '2006-10-30', 130, 80 UNION ALL
SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL
SELECT 'A006283', '2007-06-02', 120, 78 UNION ALL
SELECT 'A006283', '2007-08-12', 144, 96 UNION ALL
SELECT 'A006283', '2007-10-21', 140, 80 UNION ALL
SELECT 'A006283', '2007-11-30', 130, 80 UNION ALL
SELECT 'A006283', '2008-02-28', 140, 80 UNION ALL
SELECT 'M020100', '2008-04-20', 130, 90
Now what I really want is not a fixed number of consecutives, but any number of 3 or more consecutives (again with SYS>=130 and DIA>=80) that occur with the difference between the 1st and the last consecutive greater than 120 days. Again, I've just got a few blank lines to indicate where the different 'transitions' occur.
So running the magic query on the above data set, I should get:
1. (where difference btwn 2006-08-02 and 2007-02-28 is >120 days)
A0062832006-08-0214496
A0062832006-10-2014080
A0062832006-10-3013080
A0062832007-02-2814080
2. (where difference btwn 2007-08-12 and 2008-04-20 is >120 days)
A0062832007-08-1214496
A0062832007-10-2114080
A0062832007-11-3013080
A0062832008-02-2814080
M0201002008-04-2013090
but NOT:
A0062832006-01-1414880
A0062832006-01-1613088
A0062832006-02-2515080
where the difference between 2006-01-14 and 2006-02-25 is <120 days, although we had 3 consecutively high BPs.
Oh..I should be able to see the different groups by the way, so that for patient A006283 for example, I know he had 2 episodes of consecutively high BPs with the >120 day requirement. Having some sort of a group id (a bit like what Jerry used) within a single big result set should do.
Quite a challenging query I think..so not sure if anyone would want to spend time on this 🙁
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply