Identifying consecutive values with constraints

  • Hi All

    I have a set of blood pressure measurements like:

    MMID MESDATE SYS DIA

    A006283 2005-11-14 148 80

    A006283 2006-01-16 130 88

    A006283 2006-10-18 130 80

    A006283 2006-12-28 144 96

    A006283 2007-01-03 120 80

    A006283 2006-05-17 130 80

    A006283 2007-02-28 140 80

    A006283 2007-05-03 130 80

    A006283 2008-01-18 150 80

    A006283 2006-06-29 130 70

    ...

    M009781 2006-10-24 110 70

    M015182 2008-07-22 130 90

    M020100 2006-04-20 130 70

    I want to identify sequences where SYS>=130 and DIA>=80 for all the different patients. So for patient A006283 for example, I should get:

    1) A006283 2005-11-14 148 80 2006-01-16 130 88 2006-05-17 130 80

    2) A006283 2007-02-28 140 80 2007-05-03 130 80 2008-01-18 120 80

    cos these are the only consecutive measurements that meet the criteria (note that the raw data was not sorted!).

    I'm thinking using a CROSS APPLY will get part of the required results, but I think you need to PIVOT it as well to display the multiple rows (ie. the 3 consecutives) in a single row like above.

    Ideally I'm after a generic method (UDF?) where I can run the query for 'n' consecutives.

    Thanks

  • My first thoughts are either ROW_NUMBER, or Tally table

    I'd say provide a sample data set (in SQL code) so that we can assist you better

    To clarify though, shouldn't results be more than just 2? for SYS>=130 and DIA>=80

    Unless you only want the first and last?

    MMID MESDATE SYS DIA

    A006283 2005-11-14 148 80

    A006283 2006-01-16 130 88

    A006283 2006-10-18 130 80

    A006283 2006-12-28 144 96

    A006283 2007-01-03 120 80

    A006283 2006-05-17 130 80

    A006283 2007-02-28 140 80

    A006283 2007-05-03 130 80

    A006283 2008-01-18 150 80

    A006283 2006-06-29 130 70

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Thanks for the response Jerry. I'll try to clarify the problem further with a slightly different data set:

    Here's some raw data:

    DECLARE @BPTable TABLE (

    MMID varchar(10),

    MESDATE date,

    SYSTOLIC int,

    DIA int

    )

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2005-11-14', 148, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-01-16', 130, 88)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-10-18', 120, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-12-28', 144, 96)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-01-03', 120, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-05-17', 130, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-02-28', 140, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-05-03', 130, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2008-01-18', 150, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-06-29', 130, 90)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M009781', '2006-10-24', 110, 70)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M015182', '2008-07-22', 130, 90)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M020100', '2006-04-20', 130, 70)

    Note how the raw data is not ordered in anyway, so if you order by id and date, using a simple query:

    select *

    from @BPTable

    order by MMID, MESDATE

    then you'll have the following ordered dataset:

    A0062832005-11-14 148 80

    A0062832006-01-16 130 88

    A0062832006-05-17 130 80

    A0062832006-06-29 130 90

    A0062832006-10-18 120 80

    A0062832006-12-28 144 96

    A0062832007-01-03 120 80

    A0062832007-02-28 140 80

    A0062832007-05-03 130 80

    A0062832008-01-18 150 80

    M0097812006-10-24 110 70

    M0151822008-07-22 130 90

    M0201002006-04-20 130 70

    Now if you look at 3 consecutive occurances, it's the following:

    1.

    A0062832005-11-1414880

    A0062832006-01-1613088

    A0062832006-05-1713080

    2.

    A0062832006-01-1613088

    A0062832006-05-1713080

    A0062832006-06-2913090

    3.

    A0062832007-02-2814080

    A0062832007-05-0313080

    A0062832008-01-1815080

    Hope my problem is clearer now 🙂

    Thanks

  • Since there's a missing row in the middle you didn't show, so you are looking for any X-consecutive rows right?

    so in this case, only rows that fit the criteria would count?

    A006283 2006-12-28 144 96

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Oh..sorry for the confusion. It wasn't really a missing row as such..but just a blank line I added to show the end of that particular patient who had a bunch of BP measurements.

    Yes, I'm looking for 'x' consecutive rows (after being ordered by date though) that meet the required criteria. So if you run the magic-query on the test data I've posted, you should get 1,2,3 outputs I've shown above as the results of that query.

  • I am sure there's a better way, and less #temp tables involved

    but here's my first try. Idea is to give Identity to each row, and find the streak, and select based on the streak

    DECLARE @BPTable TABLE (

    MMID VARCHAR(10),

    MESDATE DATE,

    SYSTOLIC INT,

    DIA INT

    )

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2005-11-14', 148, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-01-16', 130, 88)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-10-18', 120, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-12-28', 144, 96)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-01-03', 120, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-05-17', 130, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-02-28', 140, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2007-05-03', 130, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2008-01-18', 150, 80)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('A006283', '2006-06-29', 130, 90)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M009781', '2006-10-24', 110, 70)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M015182', '2008-07-22', 130, 90)

    INSERT @BPTable (MMID, MESDATE, SYSTOLIC, DIA) VALUES ('M020100', '2006-04-20', 130, 70)

    SELECT *

    ,Qualify=CASE WHEN SYSTOLIC>=130 AND DIA>=80 THEN 1 ELSE 0 END

    ,id=ROW_NUMBER() OVER (ORDER BY MMID, MESDATE)

    INTO #test

    FROM @BPTable

    ORDER BY MMID, MESDATE ASC

    SELECT *,

    RowsBeforeQualify=(

    SELECT COUNT(1) FROM #test B1

    WHERE B1.MMID = B2.MMID

    AND B1.Qualify = 1

    AND B1.id BETWEEN B2.id-2 AND B2.id

    )

    INTO #test1

    FROM #test B2

    ORDER BY 1, 2

    SELECT T2.*

    FROM

    (SELECT id FROM #test1 WHERE RowsbeforeQualify = 3) T1-- 3rd record in a row

    INNER JOIN #test1 T2

    ON T2.id BETWEEN T1.id-2 AND T1.id-- only get 2 records before and itself

    DROP TABLE #test

    DROP TABLE #test1

    Results are like this (matches your desired output, 3-row streaks at id 123, 234, 890)

    MMIDMESDATESYSTOLICDIAQualifyidRowsBeforeQualify

    A0062832005-11-1414880111

    A0062832006-01-1613088122

    A0062832006-05-1713080133

    A0062832006-01-1613088122

    A0062832006-05-1713080133

    A0062832006-06-2913090143

    A0062832007-02-2814080182

    A0062832007-05-0313080192

    A0062832008-01-18150801103

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Here's one that I tested using SQL 2005, so I couldn't use DATE as a field type, but I'm quite sure this would work in 2008, and there are no temp tables, just a couple of CTEs:

    DECLARE @BP_READINGS TABLE (

    MMID varchar(10),

    MESDATE DateTime,

    SYSTOLIC int,

    DIA int

    )

    INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)

    SELECT 'A006283', '2005-11-14', 148, 80 UNION ALL

    SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL

    SELECT 'A006283', '2006-10-18', 120, 80 UNION ALL

    SELECT 'A006283', '2006-12-28', 144, 96 UNION ALL

    SELECT 'A006283', '2007-01-03', 120, 80 UNION ALL

    SELECT 'A006283', '2006-05-17', 130, 80 UNION ALL

    SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL

    SELECT 'A006283', '2007-05-03', 130, 80 UNION ALL

    SELECT 'A006283', '2008-01-18', 150, 80 UNION ALL

    SELECT 'A006283', '2006-06-29', 130, 90 UNION ALL

    SELECT 'M009781', '2006-10-24', 110, 70 UNION ALL

    SELECT 'M015182', '2008-07-22', 130, 90 UNION ALL

    SELECT 'M020100', '2006-04-20', 130, 70

    ;WITH QUALIFIED_DATA AS (

    SELECT *, CASE WHEN SYSTOLIC >= 130 AND DIA >= 80 THEN 1 ELSE 0 END AS QUALIFY,

    ROW_NUMBER() OVER (PARTITION BY MMID ORDER BY MESDATE) AS RN

    FROM @BP_READINGS

    ),

    QUALIFIED_ONLY AS (

    SELECT *

    FROM QUALIFIED_DATA

    WHERE QUALIFY = 1

    )

    SELECT Q1.*

    FROM QUALIFIED_ONLY AS Q1 INNER JOIN QUALIFIED_ONLY AS Q2

    ON Q1.MMID = Q2.MMID AND

    Q1.RN = Q2.RN - 1

    Jerry - I was able to eliminate the extra steps by just partitioning the ROW_NUMBER function by MMID, which provides a means of just comparing for consecutive row numbers within a given MMID.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks, I hate using temp tables too

    and I was sure I wasn't thinking straight with #temp and ROW_NUMBER

    Say yes to CTE and Subqueries/Derived Tables!

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • My Bad !!! I've goofed :w00t::w00t::w00t:. I was just looking at Jerry's response when I realized I could make my code better by eliminating yet another step, and then in testing the revised version, discovered that the original wasn't exactly right, as it was missing out on the last item of any set of consecutive readings that qualify. Here's the fixed code:

    DECLARE @BP_READINGS TABLE (

    MMID varchar(10),

    MESDATE DateTime,

    SYSTOLIC int,

    DIA int

    )

    INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)

    SELECT 'A006283', '2005-11-14', 148, 80 UNION ALL

    SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL

    SELECT 'A006283', '2006-10-18', 120, 80 UNION ALL

    SELECT 'A006283', '2006-12-28', 144, 96 UNION ALL

    SELECT 'A006283', '2007-01-03', 120, 80 UNION ALL

    SELECT 'A006283', '2006-05-17', 130, 80 UNION ALL

    SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL

    SELECT 'A006283', '2007-05-03', 130, 80 UNION ALL

    SELECT 'A006283', '2008-01-18', 150, 80 UNION ALL

    SELECT 'A006283', '2006-06-29', 130, 90 UNION ALL

    SELECT 'M009781', '2006-10-24', 110, 70 UNION ALL

    SELECT 'M015182', '2008-07-22', 130, 90 UNION ALL

    SELECT 'M020100', '2006-04-20', 130, 70

    SELECT *

    FROM @BP_READINGS

    ORDER BY MMID, MESDATE

    ;WITH QUALIFIED_ONLY AS (

    SELECT *, ROW_NUMBER() OVER (PARTITION BY MMID ORDER BY MESDATE) AS RN

    FROM @BP_READINGS

    WHERE SYSTOLIC >= 130 AND

    DIA >= 80

    )

    SELECT Q1.MMID, Q1.MESDATE, Q1.SYSTOLIC, Q1.DIA

    FROM QUALIFIED_ONLY AS Q1

    WHERE EXISTS (

    SELECT MMID

    FROM QUALIFIED_ONLY AS Q2

    WHERE Q2.MMID = Q1.MMID AND

    ABS(Q2.RN - Q1.RN) = 1

    )

    Please test for performance, as we have a correlated subquery here, and that can be cause for grief.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks guys. Appreciate your help.

    hmm..I just tried both solutions and seems like both are close, yet there're a few issues 🙁

    With Jerry's solution with temp tables, it kinda seems to give the output I want, but how do you determine the BPs that belong to a particular sequence? Do you simply read out in rows in blocks of 3s in this case? I don't think you can use RowsbeforeQualify to determine it's the end of a 3-row sequence, can you? Anyway..something I really like about this solution (despite using #tables) is you can easily specify 'n' (2, 3, 4 or whatever #consecutives you need), by making a few minor changes!

    With Steve's solution, the problem I see is there's no way to determine which BPs belong to which sequence. Running the query as posted with test data, you get 8-rows, which are indeed the results I need, but I don't see how you can say this is the 1st 3-row sequence, this is the 2nd 3-row sequence etc.

    Thanks for all the help again!

  • How would you want the outputs?

    so for 3-row sets, do you want all sets shown together (like my final SELECT)

    or SELECT * FROM Set1, SELECT * FROM Set2, etc.... (like your example outputs in 3 boxes)

    This may involve a loop and select top 3 on each last-row (so in my sample, id 3,4, 10)

    The thing is with temp tables, it's easy to convert them to CTE or derived tables, or table variables

    temp tables are easier for debugging and select from

    when RowsBeforeQualify=3 (or X), you are at the end of a sequence

    In short, you try to find ALL SETS of 3's, find the last id, then select id-2, id-1, id

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Initially I was thinking of an output like:

    ID, MESDATE1, SYS1, DIA1, MESDATE2, SYS2, DIA2, MESDATE3, SYS3, DIA3

    but I'm thinking that's a bit harder to do in SQL itself. Will be a lot easier to process the results in rows of X at the front end for any display purposes. Yes?

    Oh..and no, I never wanted outputs like SELECT * FROM Set1, SELECT * FROM Set2, etc. Only posted my results in different boxes to make the result sets clear.

    Yes, RowsBeforeQualify=3 (or X), you are at the end of a sequence is true, but you can't use this as the criteria to say:

    if RowsBeforeQualify=3 (or X) then the 3-row sequence is this row and 2-rows above it..right?

    Cos in the results you get a row like:

    A0062832006-05-1713080133

    A0062832006-01-1613088122

    A0062832006-05-1713080133

    but this is not really a valid 3-row sequence, although the last row is still the end of a different 3-row sequence.

    Anyway..thanks for all the help Jerry. Think I've got it sorted now 🙂

  • Once you get a correct rows dataset, it's easy to PIVOT into what you want (columns)

    RowsBeforeQualify is only meaningful if you also consider the id (which puts the dataset in correct order)

    so if you order by id, then hit a RowsBeforeQualify=3, it's guaranteed 2 rows above are part of this sequence set (since that's how I calculated RowsBeforeQualify, using ID)

    It's a start, keep hacking away, it's part of the fun

    Have a great weekend

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Ummm... I'll have something for you on Monday, as I don't have time today. No biggy, but if you needed the sequences to be grouped, you should have asked about that in your first post, or did I miss something?

    Steve

    (aka smunson)

    :):):)

    thusi (11/21/2008)


    Initially I was thinking of an output like:

    ID, MESDATE1, SYS1, DIA1, MESDATE2, SYS2, DIA2, MESDATE3, SYS3, DIA3

    but I'm thinking that's a bit harder to do in SQL itself. Will be a lot easier to process the results in rows of X at the front end for any display purposes. Yes?

    Oh..and no, I never wanted outputs like SELECT * FROM Set1, SELECT * FROM Set2, etc. Only posted my results in different boxes to make the result sets clear.

    Yes, RowsBeforeQualify=3 (or X), you are at the end of a sequence is true, but you can't use this as the criteria to say:

    if RowsBeforeQualify=3 (or X) then the 3-row sequence is this row and 2-rows above it..right?

    Cos in the results you get a row like:

    A0062832006-05-1713080133

    A0062832006-01-1613088122

    A0062832006-05-1713080133

    but this is not really a valid 3-row sequence, although the last row is still the end of a different 3-row sequence.

    Anyway..thanks for all the help Jerry. Think I've got it sorted now 🙂

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Sorry for not explicitly stating my grouping requirements Steve. In the initial post I did mention:

    I want to identify sequences where SYS>=130 and DIA>=80 for all the different patients. So for patient A006283 for example, I should get:

    1) A006283 2005-11-14 148 80 2006-01-16 130 88 2006-05-17 130 80

    2) A006283 2007-02-28 140 80 2007-05-03 130 80 2008-01-18 120 80

    and then in a later post I said

    Now if you look at 3 consecutive occurances, it's the following:..

    and listed the 3 outputs I wanted..so I was sort of implying it, but yes, you're right, I never really said grouping was a requirement as such. My apologies.

    Anyway, if you're going to spend a bit of time on this problem, can I pls get your super brain working on a slightly more challenging problem, which is essentially the next step of the current issue. Sorry..these are some queries some of the docs here in NZ got..and after spending hours n hours trying to figure out how to do it, I thought I'd post my probz in a forum.

    So let's start with a new dataset (I'm using an ordered set this time just to make it easier to visualise):

    DECLARE @BP_READINGS TABLE (

    MMID varchar(10),

    MESDATE DateTime,

    SYSTOLIC int,

    DIA int

    )

    INSERT INTO @BP_READINGS (MMID, MESDATE, SYSTOLIC, DIA)

    SELECT 'A006283', '2006-01-14', 148, 80 UNION ALL

    SELECT 'A006283', '2006-01-16', 130, 88 UNION ALL

    SELECT 'A006283', '2006-02-25', 150, 80 UNION ALL

    SELECT 'A006283', '2006-06-02', 120, 75 UNION ALL

    SELECT 'A006283', '2006-08-02', 144, 96 UNION ALL

    SELECT 'A006283', '2006-10-20', 140, 80 UNION ALL

    SELECT 'A006283', '2006-10-30', 130, 80 UNION ALL

    SELECT 'A006283', '2007-02-28', 140, 80 UNION ALL

    SELECT 'A006283', '2007-06-02', 120, 78 UNION ALL

    SELECT 'A006283', '2007-08-12', 144, 96 UNION ALL

    SELECT 'A006283', '2007-10-21', 140, 80 UNION ALL

    SELECT 'A006283', '2007-11-30', 130, 80 UNION ALL

    SELECT 'A006283', '2008-02-28', 140, 80 UNION ALL

    SELECT 'M020100', '2008-04-20', 130, 90

    Now what I really want is not a fixed number of consecutives, but any number of 3 or more consecutives (again with SYS>=130 and DIA>=80) that occur with the difference between the 1st and the last consecutive greater than 120 days. Again, I've just got a few blank lines to indicate where the different 'transitions' occur.

    So running the magic query on the above data set, I should get:

    1. (where difference btwn 2006-08-02 and 2007-02-28 is >120 days)

    A0062832006-08-0214496

    A0062832006-10-2014080

    A0062832006-10-3013080

    A0062832007-02-2814080

    2. (where difference btwn 2007-08-12 and 2008-04-20 is >120 days)

    A0062832007-08-1214496

    A0062832007-10-2114080

    A0062832007-11-3013080

    A0062832008-02-2814080

    M0201002008-04-2013090

    but NOT:

    A0062832006-01-1414880

    A0062832006-01-1613088

    A0062832006-02-2515080

    where the difference between 2006-01-14 and 2006-02-25 is <120 days, although we had 3 consecutively high BPs.

    Oh..I should be able to see the different groups by the way, so that for patient A006283 for example, I know he had 2 episodes of consecutively high BPs with the >120 day requirement. Having some sort of a group id (a bit like what Jerry used) within a single big result set should do.

    Quite a challenging query I think..so not sure if anyone would want to spend time on this 🙁

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply