Query to get rows where a value must occur X number of times in a row

Question

Post reply

Query to get rows where a value must occur X number of times in a row

niclas-1082356

SSC Veteran

Points: 298
More actions
November 8, 2013 at 2:57 am

#284860

Hello all!
I am having some trouble building a query that selects how many times a a value has occurred in a row without any other value in between.
I have a table PersonActivity(PersonId int,ActivityId int). This table logs activities on persons in the database with the id of the person and the id of the activity.
I want to run a query that selects all personids having activityid=X, Y number of times in a row without any other activtyids breaking the chain of equal activityids.
Below is some test data given that the limit of activities in a row = 3 and activityid should be = 2
The query should return only personid = 1 which has 3 activityid=2 in a row.
Thankful for any help to solve this.
CREATE TABLE [dbo].[PersonActivity](
[PersonActivityId] [int] IDENTITY(1,1) NOT NULL,
[PersonId] [int] NOT NULL,
[ActivityId] [int] NOT NULL,
CONSTRAINT [PK_PersonActivity] PRIMARY KEY CLUSTERED
(
[PersonActivityId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Test data
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
Edit: Change table to include key which orders the data
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 1

So Are you saying that the result should contain (7th, 8th) and (15th, 16th and 17th) rows ?

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

niclas-1082356 SSC Veteran Points: 298 More actions · Answer 2

The result only need to contain the personId. I am only interested to get those personid which has 3 equal acitivyid in a row per person.

Given a person has the following activityids it should not be returned in the result (1,2,2,1,2) Total of three activityid=2, but only 2 in a row.

(1,2,2,1,2,2,2,1) should return the person becuase there are three activityid=2 in a row

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 3

ok so based on sample data above which rows should get selected ?

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

niclas-1082356 SSC Veteran Points: 298 More actions · Answer 4

I guess the result will group by personId so i only need the personId selected and in the test data only personId=1 meets the requirements of three activityid=2 in-a-row.

(I need to find out which personIds has three activityid=2 in-a-row)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 5

First you will need to include some column which ORDERS the data.

As it is your sample data cannot be ordered by anything to know whether activities are consecutive.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Hunchback SSCrazy Points: 2649 More actions · Answer 6

Just remember that a table has no predefined order for the rows. Can you provide a column or group of columns that help us to identify the chronological order of the events?

niclas-1082356 SSC Veteran Points: 298 More actions · Answer 7

Ahh sorry didnt think about that.

The "real" table has an identity primary key. I've updated the example code to include a identity key column to use for ordering.

Hunchback SSCrazy Points: 2649 More actions · Answer 8

The problem you are trying to solve is known as "Finding Islands" and you can read more about it in the last book from Itzik Ben-Gan about T-SQL Querying.

The idea is to find a group identifier for those consecutive rows. One way to accomplish this is using ranking functions.

SET NOCOUNT ON;

USE tempdb;

GO

CREATE TABLE dbo.PersonActivity(

sk int NOT NULL IDENTITY(1, 1) UNIQUE CLUSTERED,

PersonId int NOT NULL,

ActivityId int NOT NULL

);

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)

GO

INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)

GO

DECLARE

@activity_id int = 2,

@number_of_times int = 3;

WITH C1 AS (

SELECT

sk,

PersonId,

ActivityId,

ROW_NUMBER() OVER(PARTITION BY PersonId ORDER BY sk) -

ROW_NUMBER() OVER(PARTITION BY PersonId ORDER BY ActivityId, sk) AS grp

FROM

dbo.PersonActivity

)

SELECT

PersonId,

ActivityId,

COUNT(*) AS cnt

FROM

C1

GROUP BY

PersonId,

ActivityId,

grp

HAVING

ActivityId = @activity_id

AND COUNT(*) >= @number_of_times;

GO

DROP TABLE dbo.PersonActivity;

GO

Do not take my suggestion literally. It is an idea that should work on it to adapt to your environment.

niclas-1082356 SSC Veteran Points: 298 More actions · Answer 9

Thank you!

Always nice to know that a problem actually has a name =)

Your example worked great.

What if I only would like to select those personIds which has the last three activity=2. As of now the activity=2 can occur before another activity for a person. The three activities must be the last three for every person.

mister.magoo SSC-Forever Points: 47174 More actions · Answer 10

niclas-1082356 (11/8/2013)
Thank you!
Always nice to know that a problem actually has a name =)
Your example worked great.
What if I only would like to select those personIds which has the last three activity=2. As of now the activity=2 can occur before another activity for a person. The three activities must be the last three for every person.

Something like this, maybe...

;with data as

(

select

PersonActivityId

,PersonId

,ActivityId

,row_number() over(partition by PersonId order by PersonActivityId)

-

row_number() over(partition by PersonId,ActivityId order by PersonActivityId) as GroupNo

from PersonActivity

), aggregates as

(

select

PersonActivityId

,PersonId

,ActivityId

,GroupNo

,max(GroupNo) over(partition by PersonId) as MaxGroupNo

,count(*) over(partition by PersonId,GroupNo) as GroupCount

from data

)

select

PersonActivityId

,PersonId

,ActivityId

from aggregates

where GroupNo=MaxGroupNo

and GroupCount>=3

and ActivityId=2

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);