April 6, 2016 at 1:51 pm
Thanks to all & apologies for the delay.
I have checked the responses as I received email alerts & fully intended to post a DDL script & my thanks to you all.
Unfortunately, I've been too busy to post the script yet.
I'll try the solution offered & come back to you.
Many thanks to everyone & your persistence in trying to help me!
I'll respond soon after some testing.
Peter 🙂
April 7, 2016 at 6:16 am
To barrow from SSCrazy's data, I had this idea for grouping the data:
DECLARE @PersonAction TABLE (
PersonID VARCHAR(1) NOT NULL
,actiondate DATETIME NOT NULL
,actiontype INTEGER NOT NULL
)
INSERT INTO @PersonAction(PersonID,actiondate,actiontype) VALUES
('A','01/01/2016',1)
,('B','01/01/2016',1)
,('C','01/01/2016',1)
,('D','01/01/2016',3)
,('A','02/01/2016',2)
,('B','02/01/2016',2)
,('C','02/01/2016',2)
,('D','02/01/2016',4)
,('A','03/01/2016',3)
,('B','03/01/2016',4)
,('C','03/01/2016',3)
,('D','03/01/2016',2)
,('A','04/01/2016',4)
,('B','04/01/2016',3)
,('C','04/01/2016',4)
,('D','04/01/2016',1)
,('E','04/01/2016',3)
,('E','05/01/2016',4)
,('E','06/01/2016',1)
,('E','01/01/2016',2)
-- The raw data
SELECT
PersonID
, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder
, actiondate
, actiontype
FROM @PersonAction
-- CTE to prioritize the dates the action occured on
;WITH Data AS (
SELECT
PersonID
, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY actiondate) AS DateOrder
, actiontype
FROM @PersonAction
)
-- Our pivoted data
SELECT
[1] AS Day1
, [2] AS Day2
, [3] AS Day3
, [4] AS Day4
, COUNT(*) AS People
FROM (
SELECT [1],[2],[3],[4]
FROM Data
PIVOT (
MAX(actiontype)
FOR DateOrder IN ([1],[2],[3],[4])
) AS PivotTable
) AS S
GROUP BY [1],[2],[3],[4]
April 9, 2016 at 10:17 pm
Hi Again,
Here are my scripts to create the data;
CREATE TABLE PatternTest
(
[PersonID] int,
[Action] varchar(50),
[RankID] int
);
-------------------------------------------------------------------------
INSERT INTO [Test_DB].[dbo].[PatternTest]
([PersonID]
,[Action]
,[RankID])
VALUES
(1,'Action1',1),
(1,'Action2',2),
(1,'Action3',3),
(1,'Action4',4),
(2,'Action1',1),
(2,'Action3',2),
(2,'Action2',3),
(2,'Action4',4),
(3,'Action1',1),
(3,'Action4',2),
(3,'Action3',3),
(3,'Action2',4),
(4,'Action4',1),
(4,'Action3',2),
(4,'Action2',3),
(4,'Action1',4),
(5,'Action2',1),
(5,'Action1',2),
(5,'Action4',3),
(5,'Action3',4),
(6, 'Action1',1),
(6,'Action2',2),
(6,'Action3',3),
(6,'Action4',4),
(7,'Action1',1),
(7,'Action3',2),
(7,'Action2',3),
(7,'Action4',4),
(8,'Action1',1),
(8,'Action4',2),
(8,'Action3',3),
(8,'Action2',4),
(9,'Action4',1),
(9,'Action3',2),
(9,'Action2',3),
(9,'Action1',4),
(10,'Action2',1),
(10,'Action1',2),
(10,'Action4',3),
(10,'Action3',4)
GO
Each Person ID has 4 Actions in a certain Order shown by the RankID field.
I want to count how many PersonIDs carried out their Actions in all possible Order combinations of Action1, Action2, Action3 & Action4.
So how many PersonIDs completed their Actions in the order of Action 1,2,3,4, and then Actions 4,3,2,1 etc etc.
Thanks
April 10, 2016 at 5:36 am
Thanks for posting the repro script!
Here is one way to get the results you need:
WITH Denormalized
AS (SELECT PersonID,
MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,
MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,
MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,
MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4
FROM dbo.PatternTest
GROUP BY PersonID)
SELECT A1, A2, A3, A4, COUNT(*)
FROM Denormalized
GROUP BY A1, A2, A3, A4;
April 12, 2016 at 2:42 am
Hello Hugo,
Thank you very much for your answer!
I entered a large dataset of my info & got a very good summary of the results.
I'm not quite across your logic about how this works.
Would you be able to explain the steps?
Much appreciated.
Peter
April 12, 2016 at 3:18 pm
pbo71465 (4/12/2016)
Would you be able to explain the steps?
Sure!
One of the things about CTEs is that they make it easy to disect a query. You can highlight just the text of the CTE, or replace the outer query with a simple SELECT * FROM CteName to see what the CTE does by itself. If you do that on the query I wrote, with the test data you posted, you will see that it transforms the data you posted into a form where every pattern is on a single row, with four columns for the four actions in order. The trick used here is that I know that for each PersonID in the input data, there will be four rows, with RankID 1 through 4. For each of those four rows, one of the CASE expressions returns the Action from that row, The other three return NULL for that row, but will return an Action value for one of the other rows of that person. I then aggregate this on PersonID, taking the MAX - which results in the NULLs being discarded (as they always are in aggregates), and the actual value remaining. (I could also have used MIN in this case, for the same result).
Once I have this result set with one row per person and four columns describing the actions in order, I use another GROUP BY (in the outer query) to create a group for each distinct list of four actions. And I then add a COUNT(*) aggregate to calculate the number of persons that have this same combination.
April 12, 2016 at 7:31 pm
Thanks you ALL very much for your assistance & Hugo for the answer.
Much appreciated !!
Peter
April 20, 2016 at 10:21 am
Why is my suggestion ignored? It may not be as fast, but it gives a more reliable result than Hugo's plus it is suitable also if the number of distinct actions differs from 4. Hugo's solution is indeed correct for this test set. However if your data is less clean than this test data, you may be ignoring certain patterns with it. For example, add one row to your test data like this for a person that redid action2 after first completing all actions:
VALUES
(7, 'Action2', 5)
and then re-run Hugo's code.
Hugo's results:
A1A2A3A4(No column name)
Action1Action2Action3Action42
Action1Action3Action2Action42
Action1Action4Action3Action22
Action2Action1Action4Action32
Action4Action3Action2Action12
My results:
actionscount
Action1,Action2,Action3,Action42
Action1,Action3,Action2,Action41
Action1,Action3,Action2,Action4,Action21
Action1,Action4,Action3,Action22
Action2,Action1,Action4,Action32
Action4,Action3,Action2,Action12
Here's the complete code if you want to do the test yourself.
use tempdb
go
if object_id('tempdb..#PatternTest') is not null
drop table #PatternTest;
CREATE TABLE #PatternTest
(
[PersonID] int,
[Action] varchar(50),
[RankID] int
);
-------------------------------------------------------------------------
INSERT INTO #PatternTest
([PersonID]
,[Action]
,[RankID])
VALUES
(1,'Action1',1),
(1,'Action2',2),
(1,'Action3',3),
(1,'Action4',4),
(2,'Action1',1),
(2,'Action3',2),
(2,'Action2',3),
(2,'Action4',4),
(3,'Action1',1),
(3,'Action4',2),
(3,'Action3',3),
(3,'Action2',4),
(4,'Action4',1),
(4,'Action3',2),
(4,'Action2',3),
(4,'Action1',4),
(5,'Action2',1),
(5,'Action1',2),
(5,'Action4',3),
(5,'Action3',4),
(6, 'Action1',1),
(6,'Action2',2),
(6,'Action3',3),
(6,'Action4',4),
(7,'Action1',1),
(7,'Action3',2),
(7,'Action2',3),
(7,'Action4',4),
(8,'Action1',1),
(8,'Action4',2),
(8,'Action3',3),
(8,'Action2',4),
(9,'Action4',1),
(9,'Action3',2),
(9,'Action2',3),
(9,'Action1',4),
(10,'Action2',1),
(10,'Action1',2),
(10,'Action4',3),
(10,'Action3',4)
-- Add the extra line, person 7 took Action2 another time after
-- doing them all.
INSERT INTO #PatternTest
([PersonID]
,[Action]
,[RankID])
VALUES
(7,'Action2',5)
raiserror( 'rrozema', 0,0) with nowait;
with ctePersons as (
select distinct PersonID
from #PatternTest
)
select a.actions, count(*) as [count]
from (
select
p.PersonID,
stuff((
select ',' + pa.action as [text()]
from #PatternTest pa
where pa.PersonID = p.PersonID
order by pa.RankID
for xml path(''), type
).value('.','varchar(2048)'), 1, 1, '') as actions
from ctePersons p
) a
group by a.actions;
raiserror( 'Hugo', 0,0) with nowait;
WITH Denormalized
AS (SELECT PersonID,
MAX(CASE WHEN RankID = 1 THEN [Action] END) AS A1,
MAX(CASE WHEN RankID = 2 THEN [Action] END) AS A2,
MAX(CASE WHEN RankID = 3 THEN [Action] END) AS A3,
MAX(CASE WHEN RankID = 4 THEN [Action] END) AS A4
FROM #PatternTest
GROUP BY PersonID)
SELECT A1, A2, A3, A4, COUNT(*)
FROM Denormalized
GROUP BY A1, A2, A3, A4;
if object_id('tempdb..#PatternTest') is not null
drop table #PatternTest;
April 22, 2016 at 12:41 am
Hi R P Rozema,
Thanks for your input, it too is a very tidy solution & I appreciate the time you took to create it.
Cheers
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply