Looping through rows and applying a count

Question

Post reply

Looping through rows and applying a count

watto84

SSC Enthusiast

Points: 191
More actions
October 2, 2016 at 10:07 pm

#330153

Hi, I was hoping I could get some help for the below that I'm trying to return. I'm a SQL Server newbie!
i have a table that can be recreated with the below:
create table stat
(
gameID int
,playID int
,name nVarchar(10)
,eventType nVarChar(2)
)
insert into stat (gameId, playId, name, eventType) values
(1 , 1, 'Jim', 'a'),
(1 , 2, 'Frank', 'g'),
(1 , 3, 'Bill', 'f'),
(1 , 4, 'Peter', 'g'),
(1 , 5, 'Bob', 'v'),
(1 , 6, 'Tony', 'w'),
(1 , 7, 'Michael', 'g'),
(1 , 8, 'Simon', 's'),
(1 , 9, 'Gary', 'v'),
(2 , 1, 'Steve', 'g'),
(2 , 2, 'Philip', 'v'),
(2 , 3, 'Mark', 'e'),
(2 , 4, 'Bob', 'g'),
(2 , 5, 'Mark', 'x'),
(2 , 6, 'Tony', 'g'),
(2 , 7, 'Michael', 'n'),
(2 , 8, 'Steve', 'u'),
(2 , 9, 'Grant', 'v')
What im trying to extract is......
Whenever a "V" event occurs, I want to return the first occurrence of "Name" in a row before the "V" event where the Name has a "G" event.
For example......game id 1, play id 5 has an event "V".........i want to return Peter in the result as Peter has the first "G" event before "V".
Frank at game id 1, play id 2 shouldn't be returned as Peter had the first "G" event before the "V" event
I would like to see the following returned, the Name and how many times it has occurred for that name
Name Count
Peter 1
Michael 1
Steve 1
Tony 1

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 1

Quick example, should be enough to get you started.

😎

USE TEEST;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.stat') IS NOT NULL DROP TABLE dbo.stat;

create table dbo.stat

(

gameID int

,playID int

,name nVarchar(10)

,eventType nVarChar(2)

);

insert into stat (gameId, playId, name, eventType) values

(1 , 1, 'Jim', 'a'),

(1 , 2, 'Frank', 'g'),

(1 , 3, 'Bill', 'f'),

(1 , 4, 'Peter', 'g'),

(1 , 5, 'Bob', 'v'),

(1 , 6, 'Tony', 'w'),

(1 , 7, 'Michael', 'g'),

(1 , 8, 'Simon', 's'),

(1 , 9, 'Gary', 'v'),

(2 , 1, 'Steve', 'g'),

(2 , 2, 'Philip', 'v'),

(2 , 3, 'Mark', 'e'),

(2 , 4, 'Bob', 'g'),

(2 , 5, 'Mark', 'x'),

(2 , 6, 'Tony', 'g'),

(2 , 7, 'Michael', 'n'),

(2 , 8, 'Steve', 'u'),

(2 , 9, 'Grant', 'v');

;WITH BASE_DATA AS

(

SELECT

ST.gameId

,ST.playId

,ST.name

,ST.eventType

,SUM(CASE

WHEN ST.eventType = 'v' THEN 1

ELSE 0

END) OVER

(

ORDER BY ST.gameId

,ST.playId

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW

) AS ST_GRP

,ROW_NUMBER() OVER

(

ORDER BY ST.gameId

,ST.playId

) AS ST_RID

FROM dbo.stat ST

WHERE ST.eventType IN ('v','g')

)

,SORTED_SET AS

(

SELECT

BD.gameId

,BD.playId

,BD.name

,BD.eventType

,BD.ST_GRP

,BD.ST_RID

,ROW_NUMBER() OVER

(

PARTITION BY BD.ST_GRP

ORDER BY BD.ST_RID DESC

) AS BD_GRP_RID

FROM BASE_DATA BD

)

SELECT

SS.playID

,SS.[name]

,COUNT(SS.playID) AS [Count]

FROM SORTED_SET SS

WHERE SS.BD_GRP_RID = 1

AND SS.eventType = 'g'

GROUP BY SS.playID

,SS.[name];

Output

playID name Count

----------- ---------- -----------

7 Michael 1

4 Peter 1

1 Steve 1

6 Tony 1

watto84 SSC Enthusiast Points: 191 More actions · Answer 2

That is incredible. I really appreciate your help, you have done an outstanding job with it, as a newbie there was no way I could have worked that out.

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 3

watto84 (10/2/2016)
That is incredible. I really appreciate your help, you have done an outstanding job with it, as a newbie there was no way I could have worked that out.

You are very welcome.

😎

Now please study the code and come back if you have any questions.

drew.allen SSC Guru Points: 76985 More actions · Answer 4

I would think that you would want to partition by the game ID. Presumably, the games are treated separately, and partitioning by the game ID would accomplish this.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:

SELECT t2.name, COUNT(*)

FROM dbo.#stat t1

--get the G events for the V event that are earlier

INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'

--now ensure they are the most recent earlier G event

AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')

--get the V events

WHERE t1.eventtype = 'v'

GROUP BY t2.name

NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

watto84 SSC Enthusiast Points: 191 More actions · Answer 6

That is very similiar to the code I tried putting together initially, so it makes me feel better that there is 1 solution that was close to what I was thinking.

I wasn't aware (and couldn't find when i searched) the SELECT MAX() syntax to get the most recent row for "G". Thank you for providing this.

I ran the code over the simple table that I provided and it worked great, however when applied to a larger table 500,000 rows it wasn't able to handle such a large data set.

Is there anyway to improve performance with the simple code provided, partition by perhaps?

And yes i can answer that there is always a G before a V.

watto84 SSC Enthusiast Points: 191 More actions · Answer 7

Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂

Thanks for your help though, easy to understand code that i can now apply to other similiar queries.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

watto84 (10/3/2016)
Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂
Thanks for your help though, easy to understand code that i can now apply to other similiar queries.

Some form of indexing would be necessary for good performance in any scenario. Table scans on larger numbers of rows can hurt - as you found out. 😀

I presume you picked the gameid, playid as clustered index, likely the primary key?? Then it serves several useful functions at once.

Kudos for being close to the solution yourself! The way to get there is to stop thinking about "looping" or "what do I want to do with this row, then the next one, etc" and start thinking "what to I want to do with the DATA" as a whole. Then just break down the wants/needs into logic you can code to. Easy-peasy, lemon-squeezy! 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 9

TheSQLGuru (10/4/2016)
watto84 (10/3/2016)
Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂
Thanks for your help though, easy to understand code that i can now apply to other similiar queries.
Some form of indexing would be necessary for good performance in any scenario. Table scans on larger numbers of rows can hurt - as you found out. 😀
I presume you picked the gameid, playid as clustered index, likely the primary key?? Then it serves several useful functions at once.
Kudos for being close to the solution yourself! The way to get there is to stop thinking about "looping" or "what do I want to do with this row, then the next one, etc" and start thinking "what to I want to do with the DATA" as a whole. Then just break down the wants/needs into logic you can code to. Easy-peasy, lemon-squeezy! 😎

Agreed. In fact, I keep such a notion in my signature line below... the one about stop thinking about what you want to do to a row...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

drew.allen SSC Guru Points: 76985 More actions · Answer 10

I found another version. It's slower on this small dataset, but I think it will run faster on a larger dataset, because it contains one fewer sort.

WITH play_events AS (

SELECT *, MAX(CASE WHEN s.eventType = 'g' THEN CAST(s.playID AS VARBINARY(4)) + CAST(name AS VARBINARY(20)) END) OVER(PARTITION BY s.gameID ORDER BY s.playID ROWS UNBOUNDED PRECEDING) AS last_g_name

FROM #stat s

WHERE s.eventType IN ('g', 'v')

)

SELECT CAST(SUBSTRING(pe.last_g_name, 5, 20) AS NVARCHAR(10)) name, COUNT(*)

FROM play_events pe

WHERE pe.eventType = 'v'

GROUP BY CAST(SUBSTRING(pe.last_g_name, 5, 20) AS NVARCHAR(10))

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 76985 More actions · Answer 11

TheSQLGuru (10/3/2016)
This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:
SELECT t2.name, COUNT(*)
FROM dbo.#stat t1
--get the G events for the V event that are earlier
INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'
--now ensure they are the most recent earlier G event
AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')
--get the V events
WHERE t1.eventtype = 'v'
GROUP BY t2.name
NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case

Unfortunately, this approach probably doesn't meet the criteria for such an approach to be very efficient. To be efficient requires that the number of records in the outer query be very small and the number in the subquery to be very large. From the data, it appears that the two numbers will be roughly in the same range.

Here are the stats from just this small set of data.

-- Eirikur's solution

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- Drew's solution

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- The SQL Guru's solution

Table '#stat_00000000000F'. Scan count 14, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Drew

PS: I did shorten the table name just to get the relevant data to fit better.

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

drew.allen (10/4/2016)
TheSQLGuru (10/3/2016)
This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:
SELECT t2.name, COUNT(*)
FROM dbo.#stat t1
--get the G events for the V event that are earlier
INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'
--now ensure they are the most recent earlier G event
AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')
--get the V events
WHERE t1.eventtype = 'v'
GROUP BY t2.name
NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case
Unfortunately, this approach probably doesn't meet the criteria for such an approach to be very efficient. To be efficient requires that the number of records in the outer query be very small and the number in the subquery to be very large. From the data, it appears that the two numbers will be roughly in the same range.
Here are the stats from just this small set of data.
-- Eirikur's solution
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Drew's solution
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- The SQL Guru's solution
Table '#stat_00000000000F'. Scan count 14, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Drew
PS: I did shorten the table name just to get the relevant data to fit better.

I absolutely agree Drew. And this is one of the few times you will hear me say this but in this case performance simply must take a back seat. Most seasoned TSQL talent I know would have difficulty developing either of the other two solutions. Heck, a very large fraction would have difficulty explaining how they work, at least in detail. I presented a very logical example that teaches as well as gives the correct results.

The good news is that OP has some fast alternatives to use while he/she works to improve their data processing chops. 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 13

The only time that performance takes a back seat for me is if accuracy is not otherwise able to be achieved.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Jeff Moden (10/4/2016)
The only time that performance takes a back seat for me is if accuracy is not otherwise able to be achieved.

And you, sir, would be one of those "seasoned TSQL talents" that are an exception to my statement. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service