November 8, 2013 at 2:57 am
Hello all!
I am having some trouble building a query that selects how many times a a value has occurred in a row without any other value in between.
I have a table PersonActivity(PersonId int,ActivityId int). This table logs activities on persons in the database with the id of the person and the id of the activity.
I want to run a query that selects all personids having activityid=X, Y number of times in a row without any other activtyids breaking the chain of equal activityids.
Below is some test data given that the limit of activities in a row = 3 and activityid should be = 2
The query should return only personid = 1 which has 3 activityid=2 in a row.
Thankful for any help to solve this.
CREATE TABLE [dbo].[PersonActivity](
[PersonActivityId] [int] IDENTITY(1,1) NOT NULL,
[PersonId] [int] NOT NULL,
[ActivityId] [int] NOT NULL,
CONSTRAINT [PK_PersonActivity] PRIMARY KEY CLUSTERED
(
[PersonActivityId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Test data
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (1, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (3, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 1)
GO
INSERT [dbo].[PersonActivity] ([PersonId], [ActivityId]) VALUES (2, 2)
GO
Edit: Change table to include key which orders the data
November 8, 2013 at 5:00 am
So Are you saying that the result should contain (7th, 8th) and (15th, 16th and 17th) rows ?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 8, 2013 at 5:09 am
The result only need to contain the personId. I am only interested to get those personid which has 3 equal acitivyid in a row per person.
Given a person has the following activityids it should not be returned in the result (1,2,2,1,2) Total of three activityid=2, but only 2 in a row.
(1,2,2,1,2,2,2,1) should return the person becuase there are three activityid=2 in a row
November 8, 2013 at 5:44 am
ok so based on sample data above which rows should get selected ?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 8, 2013 at 5:52 am
I guess the result will group by personId so i only need the personId selected and in the test data only personId=1 meets the requirements of three activityid=2 in-a-row.
(I need to find out which personIds has three activityid=2 in-a-row)
November 8, 2013 at 6:54 am
First you will need to include some column which ORDERS the data.
As it is your sample data cannot be ordered by anything to know whether activities are consecutive.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 8, 2013 at 8:31 am
Just remember that a table has no predefined order for the rows. Can you provide a column or group of columns that help us to identify the chronological order of the events?
November 8, 2013 at 8:37 am
Ahh sorry didnt think about that.
The "real" table has an identity primary key. I've updated the example code to include a identity key column to use for ordering.
November 8, 2013 at 9:48 am
The problem you are trying to solve is known as "Finding Islands" and you can read more about it in the last book from Itzik Ben-Gan about T-SQL Querying.
The idea is to find a group identifier for those consecutive rows. One way to accomplish this is using ranking functions.
SET NOCOUNT ON;
USE tempdb;
GO
CREATE TABLE dbo.PersonActivity(
sk int NOT NULL IDENTITY(1, 1) UNIQUE CLUSTERED,
PersonId int NOT NULL,
ActivityId int NOT NULL
);
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (1, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (3, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 1)
GO
INSERT dbo.PersonActivity (PersonId, ActivityId) VALUES (2, 2)
GO
DECLARE
@activity_id int = 2,
@number_of_times int = 3;
WITH C1 AS (
SELECT
sk,
PersonId,
ActivityId,
ROW_NUMBER() OVER(PARTITION BY PersonId ORDER BY sk) -
ROW_NUMBER() OVER(PARTITION BY PersonId ORDER BY ActivityId, sk) AS grp
FROM
dbo.PersonActivity
)
SELECT
PersonId,
ActivityId,
COUNT(*) AS cnt
FROM
C1
GROUP BY
PersonId,
ActivityId,
grp
HAVING
ActivityId = @activity_id
AND COUNT(*) >= @number_of_times;
GO
DROP TABLE dbo.PersonActivity;
GO
Do not take my suggestion literally. It is an idea that should work on it to adapt to your environment.
November 8, 2013 at 11:57 pm
Thank you!
Always nice to know that a problem actually has a name =)
Your example worked great.
What if I only would like to select those personIds which has the last three activity=2. As of now the activity=2 can occur before another activity for a person. The three activities must be the last three for every person.
November 9, 2013 at 4:38 pm
niclas-1082356 (11/8/2013)
Thank you!Always nice to know that a problem actually has a name =)
Your example worked great.
What if I only would like to select those personIds which has the last three activity=2. As of now the activity=2 can occur before another activity for a person. The three activities must be the last three for every person.
Something like this, maybe...
;with data as
(
select
PersonActivityId
,PersonId
,ActivityId
,row_number() over(partition by PersonId order by PersonActivityId)
-
row_number() over(partition by PersonId,ActivityId order by PersonActivityId) as GroupNo
from PersonActivity
), aggregates as
(
select
PersonActivityId
,PersonId
,ActivityId
,GroupNo
,max(GroupNo) over(partition by PersonId) as MaxGroupNo
,count(*) over(partition by PersonId,GroupNo) as GroupCount
from data
)
select
PersonActivityId
,PersonId
,ActivityId
from aggregates
where GroupNo=MaxGroupNo
and GroupCount>=3
and ActivityId=2
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply