Count patterns of behaviour in a certain order

Question

Post reply

Count patterns of behaviour in a certain order

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

OscarBoots Ten Centuries Points: 1258 More actions · Answer 1

Thanks to all & apologies for the delay.

I have checked the responses as I received email alerts & fully intended to post a DDL script & my thanks to you all.

Unfortunately, I've been too busy to post the script yet.

I'll try the solution offered & come back to you.

Many thanks to everyone & your persistence in trying to help me!

I'll respond soon after some testing.

Peter 🙂

ttudor SSC Veteran Points: 257 More actions · Answer 2

To barrow from SSCrazy's data, I had this idea for grouping the data:

DECLARE @PersonAction TABLE (

PersonID VARCHAR(1) NOT NULL

,actiondate DATETIME NOT NULL

,actiontype INTEGER NOT NULL

)

INSERT INTO @PersonAction(PersonID,actiondate,actiontype) VALUES

('A','01/01/2016',1)

,('B','01/01/2016',1)

,('C','01/01/2016',1)

,('D','01/01/2016',3)

,('A','02/01/2016',2)

,('B','02/01/2016',2)

,('C','02/01/2016',2)

,('D','02/01/2016',4)

,('A','03/01/2016',3)

,('B','03/01/2016',4)

,('C','03/01/2016',3)

,('D','03/01/2016',2)

,('A','04/01/2016',4)

,('B','04/01/2016',3)

,('C','04/01/2016',4)

,('D','04/01/2016',1)

,('E','04/01/2016',3)

,('E','05/01/2016',4)

,('E','06/01/2016',1)

,('E','01/01/2016',2)

-- The raw data

SELECT

PersonID

, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder

, actiondate

, actiontype

FROM @PersonAction

-- CTE to prioritize the dates the action occured on

;WITH Data AS (

SELECT

PersonID

, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder

, actiontype

FROM @PersonAction

)

-- Our pivoted data

SELECT

[1] AS Day1

, [2] AS Day2

, [3] AS Day3

, [4] AS Day4

, COUNT(*) AS People

FROM (

SELECT [1],[2],[3],[4]

FROM Data

PIVOT (

MAX(actiontype)

FOR DateOrder IN ([1],[2],[3],[4])

) AS PivotTable

) AS S

GROUP BY [1],[2],[3],[4]

OscarBoots Ten Centuries Points: 1258 More actions · Answer 3

Hi Again,

Here are my scripts to create the data;

CREATE TABLE PatternTest
(
[PersonID] int,
[Action] varchar(50),
[RankID] int
);
-------------------------------------------------------------------------
INSERT INTO [Test_DB].[dbo].[PatternTest]
([PersonID]
,[Action]
,[RankID])
VALUES
(1,'Action1',1),
(1,'Action2',2),
(1,'Action3',3),
(1,'Action4',4),
(2,'Action1',1),
(2,'Action3',2),
(2,'Action2',3),
(2,'Action4',4),
(3,'Action1',1),
(3,'Action4',2),
(3,'Action3',3),
(3,'Action2',4),
(4,'Action4',1),
(4,'Action3',2),
(4,'Action2',3),
(4,'Action1',4),
(5,'Action2',1),
(5,'Action1',2),
(5,'Action4',3),
(5,'Action3',4),
(6, 'Action1',1),
(6,'Action2',2),
(6,'Action3',3),
(6,'Action4',4),
(7,'Action1',1),
(7,'Action3',2),
(7,'Action2',3),
(7,'Action4',4),
(8,'Action1',1),
(8,'Action4',2),
(8,'Action3',3),
(8,'Action2',4),
(9,'Action4',1),
(9,'Action3',2),
(9,'Action2',3),
(9,'Action1',4),
(10,'Action2',1),
(10,'Action1',2),
(10,'Action4',3),
(10,'Action3',4)
GO

Each Person ID has 4 Actions in a certain Order shown by the RankID field.

I want to count how many PersonIDs carried out their Actions in all possible Order combinations of Action1, Action2, Action3 & Action4.

So how many PersonIDs completed their Actions in the order of Action 1,2,3,4, and then Actions 4,3,2,1 etc etc.

Thanks

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

Thanks for posting the repro script!

Here is one way to get the results you need:

WITH Denormalized

AS (SELECT PersonID,

MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,

MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,

MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,

MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4

FROM dbo.PatternTest

GROUP BY PersonID)

SELECT A1, A2, A3, A4, COUNT(*)

FROM Denormalized

GROUP BY A1, A2, A3, A4;

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

OscarBoots Ten Centuries Points: 1258 More actions · Answer 5

Hello Hugo,

Thank you very much for your answer!

I entered a large dataset of my info & got a very good summary of the results.

I'm not quite across your logic about how this works.

Would you be able to explain the steps?

Much appreciated.

Peter

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 6

pbo71465 (4/12/2016)
Would you be able to explain the steps?

Sure!

One of the things about CTEs is that they make it easy to disect a query. You can highlight just the text of the CTE, or replace the outer query with a simple SELECT * FROM CteName to see what the CTE does by itself. If you do that on the query I wrote, with the test data you posted, you will see that it transforms the data you posted into a form where every pattern is on a single row, with four columns for the four actions in order. The trick used here is that I know that for each PersonID in the input data, there will be four rows, with RankID 1 through 4. For each of those four rows, one of the CASE expressions returns the Action from that row, The other three return NULL for that row, but will return an Action value for one of the other rows of that person. I then aggregate this on PersonID, taking the MAX - which results in the NULLs being discarded (as they always are in aggregates), and the actual value remaining. (I could also have used MIN in this case, for the same result).

Once I have this result set with one row per person and four columns describing the actions in order, I use another GROUP BY (in the outer query) to create a group for each distinct list of four actions. And I then add a COUNT(*) aggregate to calculate the number of persons that have this same combination.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

OscarBoots Ten Centuries Points: 1258 More actions · Answer 7

Thanks you ALL very much for your assistance & Hugo for the answer.

Much appreciated !!

Peter

R.P.Rozema SSChampion Points: 12317 More actions · Answer 8

Why is my suggestion ignored? It may not be as fast, but it gives a more reliable result than Hugo's plus it is suitable also if the number of distinct actions differs from 4. Hugo's solution is indeed correct for this test set. However if your data is less clean than this test data, you may be ignoring certain patterns with it. For example, add one row to your test data like this for a person that redid action2 after first completing all actions:

VALUES

(7, 'Action2', 5)

and then re-run Hugo's code.

Hugo's results:

A1A2A3A4(No column name)

Action1Action2Action3Action42

Action1Action3Action2Action42

Action1Action4Action3Action22

Action2Action1Action4Action32

Action4Action3Action2Action12

My results:

actionscount

Action1,Action2,Action3,Action42

Action1,Action3,Action2,Action41

Action1,Action3,Action2,Action4,Action21

Action1,Action4,Action3,Action22

Action2,Action1,Action4,Action32

Action4,Action3,Action2,Action12

Here's the complete code if you want to do the test yourself.

use tempdb

go

if object_id('tempdb..#PatternTest') is not null

drop table #PatternTest;

CREATE TABLE #PatternTest

(

[PersonID] int,

[Action] varchar(50),

[RankID] int

);

-------------------------------------------------------------------------

INSERT INTO #PatternTest

([PersonID]

,[Action]

,[RankID])

VALUES

(1,'Action1',1),

(1,'Action2',2),

(1,'Action3',3),

(1,'Action4',4),

(2,'Action1',1),

(2,'Action3',2),

(2,'Action2',3),

(2,'Action4',4),

(3,'Action1',1),

(3,'Action4',2),

(3,'Action3',3),

(3,'Action2',4),

(4,'Action4',1),

(4,'Action3',2),

(4,'Action2',3),

(4,'Action1',4),

(5,'Action2',1),

(5,'Action1',2),

(5,'Action4',3),

(5,'Action3',4),

(6, 'Action1',1),

(6,'Action2',2),

(6,'Action3',3),

(6,'Action4',4),

(7,'Action1',1),

(7,'Action3',2),

(7,'Action2',3),

(7,'Action4',4),

(8,'Action1',1),

(8,'Action4',2),

(8,'Action3',3),

(8,'Action2',4),

(9,'Action4',1),

(9,'Action3',2),

(9,'Action2',3),

(9,'Action1',4),

(10,'Action2',1),

(10,'Action1',2),

(10,'Action4',3),

(10,'Action3',4)

-- Add the extra line, person 7 took Action2 another time after

-- doing them all.

INSERT INTO #PatternTest

([PersonID]

,[Action]

,[RankID])

VALUES

(7,'Action2',5)

raiserror( 'rrozema', 0,0) with nowait;

with ctePersons as (

select distinct PersonID

from #PatternTest

)

select a.actions, count(*) as [count]

from (

select

p.PersonID,

stuff((

select ',' + pa.action as [text()]

from #PatternTest pa

where pa.PersonID = p.PersonID

order by pa.RankID

for xml path(''), type

).value('.','varchar(2048)'), 1, 1, '') as actions

from ctePersons p

) a

group by a.actions;

raiserror( 'Hugo', 0,0) with nowait;

WITH Denormalized

AS (SELECT PersonID,

MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,

MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,

MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,

MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4

FROM #PatternTest

GROUP BY PersonID)

SELECT A1, A2, A3, A4, COUNT(*)

FROM Denormalized

GROUP BY A1, A2, A3, A4;

if object_id('tempdb..#PatternTest') is not null

drop table #PatternTest;

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

OscarBoots Ten Centuries Points: 1258 More actions · Answer 9

Hi R P Rozema,

Thanks for your input, it too is a very tidy solution & I appreciate the time you took to create it.

Cheers