Count patterns of behaviour in a certain order

  • Thanks to all & apologies for the delay.

    I have checked the responses as I received email alerts & fully intended to post a DDL script & my thanks to you all.

    Unfortunately, I've been too busy to post the script yet.

    I'll try the solution offered & come back to you.

    Many thanks to everyone & your persistence in trying to help me!

    I'll respond soon after some testing.

    Peter 🙂

  • To barrow from SSCrazy's data, I had this idea for grouping the data:

    DECLARE @PersonAction TABLE (

    PersonID VARCHAR(1) NOT NULL

    ,actiondate DATETIME NOT NULL

    ,actiontype INTEGER NOT NULL

    )

    INSERT INTO @PersonAction(PersonID,actiondate,actiontype) VALUES

    ('A','01/01/2016',1)

    ,('B','01/01/2016',1)

    ,('C','01/01/2016',1)

    ,('D','01/01/2016',3)

    ,('A','02/01/2016',2)

    ,('B','02/01/2016',2)

    ,('C','02/01/2016',2)

    ,('D','02/01/2016',4)

    ,('A','03/01/2016',3)

    ,('B','03/01/2016',4)

    ,('C','03/01/2016',3)

    ,('D','03/01/2016',2)

    ,('A','04/01/2016',4)

    ,('B','04/01/2016',3)

    ,('C','04/01/2016',4)

    ,('D','04/01/2016',1)

    ,('E','04/01/2016',3)

    ,('E','05/01/2016',4)

    ,('E','06/01/2016',1)

    ,('E','01/01/2016',2)

    -- The raw data

    SELECT

    PersonID

    , ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder

    , actiondate

    , actiontype

    FROM @PersonAction

    -- CTE to prioritize the dates the action occured on

    ;WITH Data AS (

    SELECT

    PersonID

    , ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder

    , actiontype

    FROM @PersonAction

    )

    -- Our pivoted data

    SELECT

    [1] AS Day1

    , [2] AS Day2

    , [3] AS Day3

    , [4] AS Day4

    , COUNT(*) AS People

    FROM (

    SELECT [1],[2],[3],[4]

    FROM Data

    PIVOT (

    MAX(actiontype)

    FOR DateOrder IN ([1],[2],[3],[4])

    ) AS PivotTable

    ) AS S

    GROUP BY [1],[2],[3],[4]

  • Hi Again,

    Here are my scripts to create the data;

    CREATE TABLE PatternTest

    (

    [PersonID] int,

    [Action] varchar(50),

    [RankID] int

    );

    -------------------------------------------------------------------------

    INSERT INTO [Test_DB].[dbo].[PatternTest]

    ([PersonID]

    ,[Action]

    ,[RankID])

    VALUES

    (1,'Action1',1),

    (1,'Action2',2),

    (1,'Action3',3),

    (1,'Action4',4),

    (2,'Action1',1),

    (2,'Action3',2),

    (2,'Action2',3),

    (2,'Action4',4),

    (3,'Action1',1),

    (3,'Action4',2),

    (3,'Action3',3),

    (3,'Action2',4),

    (4,'Action4',1),

    (4,'Action3',2),

    (4,'Action2',3),

    (4,'Action1',4),

    (5,'Action2',1),

    (5,'Action1',2),

    (5,'Action4',3),

    (5,'Action3',4),

    (6, 'Action1',1),

    (6,'Action2',2),

    (6,'Action3',3),

    (6,'Action4',4),

    (7,'Action1',1),

    (7,'Action3',2),

    (7,'Action2',3),

    (7,'Action4',4),

    (8,'Action1',1),

    (8,'Action4',2),

    (8,'Action3',3),

    (8,'Action2',4),

    (9,'Action4',1),

    (9,'Action3',2),

    (9,'Action2',3),

    (9,'Action1',4),

    (10,'Action2',1),

    (10,'Action1',2),

    (10,'Action4',3),

    (10,'Action3',4)

    GO

    Each Person ID has 4 Actions in a certain Order shown by the RankID field.

    I want to count how many PersonIDs carried out their Actions in all possible Order combinations of Action1, Action2, Action3 & Action4.

    So how many PersonIDs completed their Actions in the order of Action 1,2,3,4, and then Actions 4,3,2,1 etc etc.

    Thanks

  • Thanks for posting the repro script!

    Here is one way to get the results you need:

    WITH Denormalized

    AS (SELECT PersonID,

    MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,

    MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,

    MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,

    MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4

    FROM dbo.PatternTest

    GROUP BY PersonID)

    SELECT A1, A2, A3, A4, COUNT(*)

    FROM Denormalized

    GROUP BY A1, A2, A3, A4;


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hello Hugo,

    Thank you very much for your answer!

    I entered a large dataset of my info & got a very good summary of the results.

    I'm not quite across your logic about how this works.

    Would you be able to explain the steps?

    Much appreciated.

    Peter

  • pbo71465 (4/12/2016)


    Would you be able to explain the steps?

    Sure!

    One of the things about CTEs is that they make it easy to disect a query. You can highlight just the text of the CTE, or replace the outer query with a simple SELECT * FROM CteName to see what the CTE does by itself. If you do that on the query I wrote, with the test data you posted, you will see that it transforms the data you posted into a form where every pattern is on a single row, with four columns for the four actions in order. The trick used here is that I know that for each PersonID in the input data, there will be four rows, with RankID 1 through 4. For each of those four rows, one of the CASE expressions returns the Action from that row, The other three return NULL for that row, but will return an Action value for one of the other rows of that person. I then aggregate this on PersonID, taking the MAX - which results in the NULLs being discarded (as they always are in aggregates), and the actual value remaining. (I could also have used MIN in this case, for the same result).

    Once I have this result set with one row per person and four columns describing the actions in order, I use another GROUP BY (in the outer query) to create a group for each distinct list of four actions. And I then add a COUNT(*) aggregate to calculate the number of persons that have this same combination.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks you ALL very much for your assistance & Hugo for the answer.

    Much appreciated !!

    Peter

  • Why is my suggestion ignored? It may not be as fast, but it gives a more reliable result than Hugo's plus it is suitable also if the number of distinct actions differs from 4. Hugo's solution is indeed correct for this test set. However if your data is less clean than this test data, you may be ignoring certain patterns with it. For example, add one row to your test data like this for a person that redid action2 after first completing all actions:

    VALUES

    (7, 'Action2', 5)

    and then re-run Hugo's code.

    Hugo's results:

    A1A2A3A4(No column name)

    Action1Action2Action3Action42

    Action1Action3Action2Action42

    Action1Action4Action3Action22

    Action2Action1Action4Action32

    Action4Action3Action2Action12

    My results:

    actionscount

    Action1,Action2,Action3,Action42

    Action1,Action3,Action2,Action41

    Action1,Action3,Action2,Action4,Action21

    Action1,Action4,Action3,Action22

    Action2,Action1,Action4,Action32

    Action4,Action3,Action2,Action12

    Here's the complete code if you want to do the test yourself.

    use tempdb

    go

    if object_id('tempdb..#PatternTest') is not null

    drop table #PatternTest;

    CREATE TABLE #PatternTest

    (

    [PersonID] int,

    [Action] varchar(50),

    [RankID] int

    );

    -------------------------------------------------------------------------

    INSERT INTO #PatternTest

    ([PersonID]

    ,[Action]

    ,[RankID])

    VALUES

    (1,'Action1',1),

    (1,'Action2',2),

    (1,'Action3',3),

    (1,'Action4',4),

    (2,'Action1',1),

    (2,'Action3',2),

    (2,'Action2',3),

    (2,'Action4',4),

    (3,'Action1',1),

    (3,'Action4',2),

    (3,'Action3',3),

    (3,'Action2',4),

    (4,'Action4',1),

    (4,'Action3',2),

    (4,'Action2',3),

    (4,'Action1',4),

    (5,'Action2',1),

    (5,'Action1',2),

    (5,'Action4',3),

    (5,'Action3',4),

    (6, 'Action1',1),

    (6,'Action2',2),

    (6,'Action3',3),

    (6,'Action4',4),

    (7,'Action1',1),

    (7,'Action3',2),

    (7,'Action2',3),

    (7,'Action4',4),

    (8,'Action1',1),

    (8,'Action4',2),

    (8,'Action3',3),

    (8,'Action2',4),

    (9,'Action4',1),

    (9,'Action3',2),

    (9,'Action2',3),

    (9,'Action1',4),

    (10,'Action2',1),

    (10,'Action1',2),

    (10,'Action4',3),

    (10,'Action3',4)

    -- Add the extra line, person 7 took Action2 another time after

    -- doing them all.

    INSERT INTO #PatternTest

    ([PersonID]

    ,[Action]

    ,[RankID])

    VALUES

    (7,'Action2',5)

    raiserror( 'rrozema', 0,0) with nowait;

    with ctePersons as (

    select distinct PersonID

    from #PatternTest

    )

    select a.actions, count(*) as [count]

    from (

    select

    p.PersonID,

    stuff((

    select ',' + pa.action as [text()]

    from #PatternTest pa

    where pa.PersonID = p.PersonID

    order by pa.RankID

    for xml path(''), type

    ).value('.','varchar(2048)'), 1, 1, '') as actions

    from ctePersons p

    ) a

    group by a.actions;

    raiserror( 'Hugo', 0,0) with nowait;

    WITH Denormalized

    AS (SELECT PersonID,

    MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,

    MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,

    MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,

    MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4

    FROM #PatternTest

    GROUP BY PersonID)

    SELECT A1, A2, A3, A4, COUNT(*)

    FROM Denormalized

    GROUP BY A1, A2, A3, A4;

    if object_id('tempdb..#PatternTest') is not null

    drop table #PatternTest;



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Hi R P Rozema,

    Thanks for your input, it too is a very tidy solution & I appreciate the time you took to create it.

    Cheers

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply