How do I count the occurrences of a sequence of records

  • I'm dealing with a table that stores records for states of items as they travel through a workflow. I need to be able to count the number of times a certain loose sequence of states occurs. For example, if a "re-open" occurs followed at some point by a "ready for build", that counts as one. That sequence could occur multiple times for that item and would need to be counted each time. There could be several occurrences of "ready for build" but they should only be counted if they follow a "re-open". (Following does not mean immediately follow because there could be other states between.) Either a date column or the auto incrementing id column could be used to determine order.

    There are various other columns in the table but those that pertain to this problem are ID, StateChgDate, State.

    Some examples of how the states can occur:

    Ready for build

    Ready for build

    Re-open

    Ready for build

    Ready for build

    The count should be 1.

    Ready for build

    Re-open

    Ready for build

    Ready for build

    Re-open

    The count should be 1.

    Ready for build

    Re-open

    Ready for build

    Ready for build

    Ready for build

    Re-open

    Ready for build

    Ready for build

    The count should be 2.

    Any ideas on how to accomplish this?

  • Do you have any DDL and sample data for us to use? Also what have you tried so far. See the first link in my signature on how to post samples to this site. Setup code will help us get an answer for you faster

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks for the helpful directions. I'll start again.

    I'm dealing with a table that stores records for states of items as they travel through a workflow. I need to be able to count the number of times a certain loose sequence of states occurs. For example, if a "re-open" occurs followed at some point by a "ready for build", that counts as one. That sequence could occur multiple times for that item and would need to be counted each time. There could be several occurrences of "ready for build" but they should only be counted if they follow a "re-open". (Following does not mean immediately follow because there could be other states between.) Either the datetime column or the identity id column could be used to determine order.

    The following will create a table populated with test data:

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL

    DROP TABLE #mytable

    --===== Create the test table with

    CREATE TABLE #mytable

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, --Is an IDENTITY column on real table

    ITEMID INT, --Foreign key to table with items that are changing state

    StateChgDt DATETIME,

    State CHAR(20)

    )

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT #mytable ON

    --===== Insert the test data into the test table

    INSERT INTO #mytable

    (ID, ITEMID, StateChgDt, State)

    SELECT 1,100,'2010-03-10 14:55:00.090','Ready for build' UNION ALL

    SELECT 2,100,'2010-03-11 14:55:00.090','Update' UNION ALL

    SELECT 3,100,'2010-03-11 15:55:00.090','Ready for build' UNION ALL

    SELECT 4,100,'2010-03-12 14:55:00.090','Re-open' UNION ALL

    SELECT 5,100,'2010-03-13 14:55:00.090','Ready for build' UNION ALL

    SELECT 6,100,'2010-03-14 14:55:00.090',NULL UNION ALL

    SELECT 7,100,'2010-03-18 14:55:00.090','Ready for build' UNION ALL

    -- The records above, with itemid 100, should give a count of 1

    SELECT 8,101,'2010-03-20 14:55:00.090','Update' UNION ALL

    SELECT 9,101,'2010-03-21 14:55:00.090','Update' UNION ALL

    SELECT 10,101,'2010-03-22 14:55:00.090','Ready for build' UNION ALL

    SELECT 11,101,'2010-03-23 14:55:00.090','Pick Up' UNION ALL

    SELECT 12,101,'2010-03-24 14:55:00.090','Re-open' UNION ALL

    SELECT 13,101,'2010-03-25 14:55:00.090','Ready for build' UNION ALL

    SELECT 14,101,'2010-03-26 14:55:00.090','Ready for build' UNION ALL

    SELECT 15,101,'2010-03-27 14:55:00.090','Pick Up' UNION ALL

    SELECT 16,101,'2010-03-27 16:55:00.090','Re-open' UNION ALL

    -- The records above, with itemid 101, should give a count of 1

    SELECT 17,102,'2010-03-28 14:55:00.090','Ready for build' UNION ALL

    SELECT 18,102,'2010-04-01 14:55:00.090','Re-open' UNION ALL

    SELECT 19,102,'2010-04-02 14:55:00.090','Pick up' UNION ALL

    SELECT 20,102,'2010-04-03 14:55:00.090','Update' UNION ALL

    SELECT 21,102,'2010-04-04 14:55:00.090',NULL UNION ALL

    SELECT 22,102,'2010-04-05 14:55:00.090','Ready for build' UNION ALL

    SELECT 23,102,'2010-04-18 14:55:00.090',NULL UNION ALL

    SELECT 24,102,'2010-04-28 14:55:00.090',NULL UNION ALL

    SELECT 25,102,'2010-05-01 14:55:00.090','Update' UNION ALL

    SELECT 26,102,'2010-05-04 14:55:00.090','Ready for build' UNION ALL

    SELECT 27,102,'2010-05-05 14:55:00.090','Ready for build' UNION ALL

    SELECT 28,102,'2010-05-07 14:55:00.090','Re-open' UNION ALL

    SELECT 29,102,'2010-05-08 14:55:00.090','Update' UNION ALL

    SELECT 30,102,'2010-05-09 14:55:00.090','Ready for build' UNION ALL

    SELECT 31,102,'2010-05-11 14:55:00.090','Update' UNION ALL

    SELECT 32,102,'2010-05-12 14:55:00.090','Ready for build'

    SELECT 33,102,'2010-05-14 14:55:00.090','Update'

    -- The records above, with itemid 102, should give a count of 2

    -- The total count for all records above should be 4

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT #mytable OFF

    The records in the test data were created, grouped by the ITEMID for clarity. The records could be mixed together in terms of ITEMID but the ID and StateChgDt will still indicate the proper sequence for state changes for any item. I don't need a separate count for each ITEMID. I just need to get a cumulative count for all Re-opens that were then Ready for build.

    I will also need to run this query on a SQL 2000 server. So it would be good if I didn't have to use SQL features that require 2005.

    Any help with this would be much appreciated. Thanks.

  • It seems like a job for the "quirky update" method as described in Jeffs article[/url].

    I recommend you read the article and try to modify it to meet your requirements.

    If you have problems, post back where you get stuck.

    Usually, I provide a coded answer if there are sample data available the way you did (good job, btw... 😉 ). But since it is essential to understand the rules that need to be followed and to really understand what the code does I usually post just the link to Jeffs article as a first reply. I'd be glad to further assist you but it's really mandatory to understand the concept instead of copy and paste a coded solution...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Good article, but I think that may add confusion and unnecessary complexity to a fairly simple query.

    I did simplify the table somewhat ... if you already have an IDENTITY set up, then by all means use that, but you don't have to.

    DECLARE @t_temp TABLE

    (

    ITEMIDINT,--Foreign key to table with items that are changing state

    StateChgDtDATETIME,

    [State]CHAR(20)

    )

    INSERT INTO @t_temp (ITEMID, StateChgDt, [State]) SELECT 100,'2010-03-10 14:55:00.090','Ready for build'

    UNION SELECT 100,'2010-03-11 14:55:00.090','Update'

    UNION SELECT 100,'2010-03-11 15:55:00.090','Ready for build'

    UNION SELECT 100,'2010-03-12 14:55:00.090','Re-open'

    UNION SELECT 100,'2010-03-13 14:55:00.090','Ready for build'

    UNION SELECT 100,'2010-03-14 14:55:00.090',NULL

    UNION SELECT 100,'2010-03-18 14:55:00.090','Ready for build'

    UNION SELECT 101,'2010-03-20 14:55:00.090','Update'

    UNION SELECT 101,'2010-03-21 14:55:00.090','Update'

    UNION SELECT 101,'2010-03-22 14:55:00.090','Ready for build'

    UNION SELECT 101,'2010-03-23 14:55:00.090','Pick Up'

    UNION SELECT 101,'2010-03-24 14:55:00.090','Re-open'

    UNION SELECT 101,'2010-03-25 14:55:00.090','Ready for build'

    UNION SELECT 101,'2010-03-26 14:55:00.090','Ready for build'

    UNION SELECT 101,'2010-03-27 14:55:00.090','Pick Up'

    UNION SELECT 101,'2010-03-27 16:55:00.090','Re-open'

    UNION SELECT 102,'2010-03-28 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-04-01 14:55:00.090','Re-open'

    UNION SELECT 102,'2010-04-02 14:55:00.090','Pick up'

    UNION SELECT 102,'2010-04-03 14:55:00.090','Update'

    UNION SELECT 102,'2010-04-04 14:55:00.090',NULL

    UNION SELECT 102,'2010-04-05 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-04-18 14:55:00.090',NULL

    UNION SELECT 102,'2010-04-28 14:55:00.090',NULL

    UNION SELECT 102,'2010-05-01 14:55:00.090','Update'

    UNION SELECT 102,'2010-05-04 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-05-05 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-05-07 14:55:00.090','Re-open'

    UNION SELECT 102,'2010-05-08 14:55:00.090','Update'

    UNION SELECT 102,'2010-05-09 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-05-11 14:55:00.090','Update'

    UNION SELECT 102,'2010-05-12 14:55:00.090','Ready for build'

    UNION SELECT 102,'2010-05-14 14:55:00.090','Update'

    This will return the data that you've described:

    SELECT t.ITEMID,

    COUNT(t.StateChgDt)

    FROM @t_temp t

    WHERE t.[State] = 'Re-open'

    AND EXISTS

    (

    SELECT sq.ITEMID

    FROM @t_temp sq

    WHERE sq.ITEMID = t.ITEMID

    AND sq.[State] = 'Ready for build'

    AND sq.StateChgDt > t.StateChgDt

    )

    GROUP BY t.ITEMID

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Nice and clean solution.

    But if you have the following scenario of two 'Re-open' without a 'Ready for build' in between, you might get wrong results:

    UNION SELECT 101,'2010-03-24 14:55:00.090','Re-open'

    UNION SELECT 101,'2010-03-25 14:55:00.090','Re-open'

    UNION SELECT 101,'2010-03-26 14:55:00.090','Ready for build'

    You'd count it as 2. The question is, whether this is correct or not...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hmmmm... I'm thinking you could get a false count if the data looked like...

    Re-Open

    whatever

    whatever

    Re-Open

    whatever

    whatever

    Ready For Build

    I don't know if that could ever happen but I see nothing to prevent such a sequence.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... great minds think alike, Lutz... your post wasn't there when I started typing my post. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That wouldn't be hard to check for if that's part of the requirements. What he said was

    if a "re-open" occurs followed at some point by a "ready for build", that counts as one

    so that's how it is coded.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • bteraberry (5/18/2010)


    That wouldn't be hard to check for if that's part of the requirements. What he said was

    if a "re-open" occurs followed at some point by a "ready for build", that counts as one

    so that's how it is coded.

    like I said in my post before, the OP needs to decide whether this condition will ever happen and if so what the expected result needs to be. Based on that, your code is nice AND clean AND (provides the expected OR not).

    On the other side it's still some sort of triangular join that might cause performance issues. Again, the OP needs to verify.

    The biggest difference so far is that you actually took the time to provide a coded version whereas the "quirky update" is "just a link" 😉

    Let's see how te story continue...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Well, if you really don't want to count 'Re-open's that don't have a 'Ready for build' in between:

    WITH cteTemp (RowID, ITEMID, StateChgDt, [State])

    AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY ITEMID, StateChgDt),

    ITEMID,

    StateChgDt,

    [State]

    FROM @t_temp

    WHERE [State] = 'Re-open'

    OR [State] = 'Ready for build'

    )

    SELECT t.ITEMID,

    COUNT(t.StateChgDt)

    FROM cteTemp t

    LEFT JOIN cteTemp t_2

    ON t_2.RowID = t.RowID - 1

    AND t_2.ITEMID = t.ITEMID

    WHERE t.[State] = 'Re-open'

    AND t_2.[State] <> 'Re-open'

    AND EXISTS

    (

    SELECT sq.ITEMID

    FROM cteTemp sq

    WHERE sq.ITEMID = t.ITEMID

    AND sq.[State] = 'Ready for build'

    AND sq.RowID > t.RowID

    )

    GROUP BY t.ITEMID

    The ROW_NUMBER statement would need to be adjusted depending on how you would want to handle potential ties in separate records put in for the same ITEMID. But you get the idea.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • This was actually my first thought before I stumbled over the OP's requirement at the end of his first post:

    I will also need to run this query on a SQL 2000 server. So it would be good if I didn't have to use SQL features that require 2005.

    Therefore I've chosen to recommend the quirky update.

    In terms of coded solutions you're leading 2:0. The effort you put in needs to be honored. I guess it's time for the OP to clarify.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Lutz,

    You already know what you're doing. I'm hoping to throw stuff out that will hopefully help people and potentially provide me with feedback at the same time.

    Until very recently I was only able to spend about 20-30% of my time on SQL Server and my knowledge of the product greatly suffered as a result. I'm just trying to get ramped up.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • RobD1 (5/18/2010)


    I will also need to run this query on a SQL 2000 server. So it would be good if I didn't have to use SQL features that require 2005.

    Didn't even notice this before so scratch the CTE. You could do the same thing with table variable or temp table (and replace ROW_NUMBER() with an IDENTITY and ORDER BY on the select into.) Performance will depend on the number of records involved.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks for all the input on this. In answer to the question about having 2 re-opens before a ready for build, yes, you could have 2 or even more. It would be a bit odd but it is possible.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply