Looping through rows and applying a count

  • Hi, I was hoping I could get some help for the below that I'm trying to return. I'm a SQL Server newbie!

    i have a table that can be recreated with the below:

    create table stat

    (

    gameID int

    ,playID int

    ,name nVarchar(10)

    ,eventType nVarChar(2)

    )

    insert into stat (gameId, playId, name, eventType) values

    (1 , 1, 'Jim', 'a'),

    (1 , 2, 'Frank', 'g'),

    (1 , 3, 'Bill', 'f'),

    (1 , 4, 'Peter', 'g'),

    (1 , 5, 'Bob', 'v'),

    (1 , 6, 'Tony', 'w'),

    (1 , 7, 'Michael', 'g'),

    (1 , 8, 'Simon', 's'),

    (1 , 9, 'Gary', 'v'),

    (2 , 1, 'Steve', 'g'),

    (2 , 2, 'Philip', 'v'),

    (2 , 3, 'Mark', 'e'),

    (2 , 4, 'Bob', 'g'),

    (2 , 5, 'Mark', 'x'),

    (2 , 6, 'Tony', 'g'),

    (2 , 7, 'Michael', 'n'),

    (2 , 8, 'Steve', 'u'),

    (2 , 9, 'Grant', 'v')

    What im trying to extract is......

    Whenever a "V" event occurs, I want to return the first occurrence of "Name" in a row before the "V" event where the Name has a "G" event.

    For example......game id 1, play id 5 has an event "V".........i want to return Peter in the result as Peter has the first "G" event before "V".

    Frank at game id 1, play id 2 shouldn't be returned as Peter had the first "G" event before the "V" event

    I would like to see the following returned, the Name and how many times it has occurred for that name

    Name Count

    Peter 1

    Michael 1

    Steve 1

    Tony 1

  • Quick example, should be enough to get you started.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.stat') IS NOT NULL DROP TABLE dbo.stat;

    create table dbo.stat

    (

    gameID int

    ,playID int

    ,name nVarchar(10)

    ,eventType nVarChar(2)

    );

    insert into stat (gameId, playId, name, eventType) values

    (1 , 1, 'Jim', 'a'),

    (1 , 2, 'Frank', 'g'),

    (1 , 3, 'Bill', 'f'),

    (1 , 4, 'Peter', 'g'),

    (1 , 5, 'Bob', 'v'),

    (1 , 6, 'Tony', 'w'),

    (1 , 7, 'Michael', 'g'),

    (1 , 8, 'Simon', 's'),

    (1 , 9, 'Gary', 'v'),

    (2 , 1, 'Steve', 'g'),

    (2 , 2, 'Philip', 'v'),

    (2 , 3, 'Mark', 'e'),

    (2 , 4, 'Bob', 'g'),

    (2 , 5, 'Mark', 'x'),

    (2 , 6, 'Tony', 'g'),

    (2 , 7, 'Michael', 'n'),

    (2 , 8, 'Steve', 'u'),

    (2 , 9, 'Grant', 'v');

    ;WITH BASE_DATA AS

    (

    SELECT

    ST.gameId

    ,ST.playId

    ,ST.name

    ,ST.eventType

    ,SUM(CASE

    WHEN ST.eventType = 'v' THEN 1

    ELSE 0

    END) OVER

    (

    ORDER BY ST.gameId

    ,ST.playId

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW

    ) AS ST_GRP

    ,ROW_NUMBER() OVER

    (

    ORDER BY ST.gameId

    ,ST.playId

    ) AS ST_RID

    FROM dbo.stat ST

    WHERE ST.eventType IN ('v','g')

    )

    ,SORTED_SET AS

    (

    SELECT

    BD.gameId

    ,BD.playId

    ,BD.name

    ,BD.eventType

    ,BD.ST_GRP

    ,BD.ST_RID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY BD.ST_GRP

    ORDER BY BD.ST_RID DESC

    ) AS BD_GRP_RID

    FROM BASE_DATA BD

    )

    SELECT

    SS.playID

    ,SS.[name]

    ,COUNT(SS.playID) AS [Count]

    FROM SORTED_SET SS

    WHERE SS.BD_GRP_RID = 1

    AND SS.eventType = 'g'

    GROUP BY SS.playID

    ,SS.[name];

    Output

    playID name Count

    ----------- ---------- -----------

    7 Michael 1

    4 Peter 1

    1 Steve 1

    6 Tony 1

  • That is incredible. I really appreciate your help, you have done an outstanding job with it, as a newbie there was no way I could have worked that out.

  • watto84 (10/2/2016)


    That is incredible. I really appreciate your help, you have done an outstanding job with it, as a newbie there was no way I could have worked that out.

    You are very welcome.

    😎

    Now please study the code and come back if you have any questions.

  • I would think that you would want to partition by the game ID. Presumably, the games are treated separately, and partitioning by the game ID would accomplish this.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:

    SELECT t2.name, COUNT(*)

    FROM dbo.#stat t1

    --get the G events for the V event that are earlier

    INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'

    --now ensure they are the most recent earlier G event

    AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')

    --get the V events

    WHERE t1.eventtype = 'v'

    GROUP BY t2.name

    NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • That is very similiar to the code I tried putting together initially, so it makes me feel better that there is 1 solution that was close to what I was thinking.

    I wasn't aware (and couldn't find when i searched) the SELECT MAX() syntax to get the most recent row for "G". Thank you for providing this.

    I ran the code over the simple table that I provided and it worked great, however when applied to a larger table 500,000 rows it wasn't able to handle such a large data set.

    Is there anyway to improve performance with the simple code provided, partition by perhaps?

    And yes i can answer that there is always a G before a V.

  • Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂

    Thanks for your help though, easy to understand code that i can now apply to other similiar queries.

  • watto84 (10/3/2016)


    Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂

    Thanks for your help though, easy to understand code that i can now apply to other similiar queries.

    Some form of indexing would be necessary for good performance in any scenario. Table scans on larger numbers of rows can hurt - as you found out. 😀

    I presume you picked the gameid, playid as clustered index, likely the primary key?? Then it serves several useful functions at once.

    Kudos for being close to the solution yourself! The way to get there is to stop thinking about "looping" or "what do I want to do with this row, then the next one, etc" and start thinking "what to I want to do with the DATA" as a whole. Then just break down the wants/needs into logic you can code to. Easy-peasy, lemon-squeezy! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (10/4/2016)


    watto84 (10/3/2016)


    Problem solved on the large data set, i worked out I just needed to apply a clustered index to my table 🙂 Sorry.....Newbie 🙂

    Thanks for your help though, easy to understand code that i can now apply to other similiar queries.

    Some form of indexing would be necessary for good performance in any scenario. Table scans on larger numbers of rows can hurt - as you found out. 😀

    I presume you picked the gameid, playid as clustered index, likely the primary key?? Then it serves several useful functions at once.

    Kudos for being close to the solution yourself! The way to get there is to stop thinking about "looping" or "what do I want to do with this row, then the next one, etc" and start thinking "what to I want to do with the DATA" as a whole. Then just break down the wants/needs into logic you can code to. Easy-peasy, lemon-squeezy! 😎

    Agreed. In fact, I keep such a notion in my signature line below... the one about stop thinking about what you want to do to a row...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I found another version. It's slower on this small dataset, but I think it will run faster on a larger dataset, because it contains one fewer sort.

    WITH play_events AS (

    SELECT *, MAX(CASE WHEN s.eventType = 'g' THEN CAST(s.playID AS VARBINARY(4)) + CAST(name AS VARBINARY(20)) END) OVER(PARTITION BY s.gameID ORDER BY s.playID ROWS UNBOUNDED PRECEDING) AS last_g_name

    FROM #stat s

    WHERE s.eventType IN ('g', 'v')

    )

    SELECT CAST(SUBSTRING(pe.last_g_name, 5, 20) AS NVARCHAR(10)) name, COUNT(*)

    FROM play_events pe

    WHERE pe.eventType = 'v'

    GROUP BY CAST(SUBSTRING(pe.last_g_name, 5, 20) AS NVARCHAR(10))

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • TheSQLGuru (10/3/2016)


    This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:

    SELECT t2.name, COUNT(*)

    FROM dbo.#stat t1

    --get the G events for the V event that are earlier

    INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'

    --now ensure they are the most recent earlier G event

    AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')

    --get the V events

    WHERE t1.eventtype = 'v'

    GROUP BY t2.name

    NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case

    Unfortunately, this approach probably doesn't meet the criteria for such an approach to be very efficient. To be efficient requires that the number of records in the outer query be very small and the number in the subquery to be very large. From the data, it appears that the two numbers will be roughly in the same range.

    Here are the stats from just this small set of data.

    -- Eirikur's solution

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- Drew's solution

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- The SQL Guru's solution

    Table '#stat_00000000000F'. Scan count 14, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Drew

    PS: I did shorten the table name just to get the relevant data to fit better.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (10/4/2016)


    TheSQLGuru (10/3/2016)


    This sure does seem a LOT simpler, and easier for a newbie (and this old brain too!) to understand:

    SELECT t2.name, COUNT(*)

    FROM dbo.#stat t1

    --get the G events for the V event that are earlier

    INNER JOIN dbo.#stat t2 ON t1.gameid = t2.gameid AND t2.playID < t1.playID AND t2.eventtype = 'g'

    --now ensure they are the most recent earlier G event

    AND t2.playid = (SELECT MAX(playid) FROM #stat t3 WHERE t3.gameid = t1.gameid AND t3.playid < t1.playid AND t3.eventtype = 'g')

    --get the V events

    WHERE t1.eventtype = 'v'

    GROUP BY t2.name

    NOTE: if you cannot ALWAYS GUARANTEE at least one G preceeds every V then you need slightly different logic depending on how you want to handle that case

    Unfortunately, this approach probably doesn't meet the criteria for such an approach to be very efficient. To be efficient requires that the number of records in the outer query be very small and the number in the subquery to be very large. From the data, it appears that the two numbers will be roughly in the same range.

    Here are the stats from just this small set of data.

    -- Eirikur's solution

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- Drew's solution

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#stat_00000000000F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- The SQL Guru's solution

    Table '#stat_00000000000F'. Scan count 14, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Drew

    PS: I did shorten the table name just to get the relevant data to fit better.

    I absolutely agree Drew. And this is one of the few times you will hear me say this but in this case performance simply must take a back seat. Most seasoned TSQL talent I know would have difficulty developing either of the other two solutions. Heck, a very large fraction would have difficulty explaining how they work, at least in detail. I presented a very logical example that teaches as well as gives the correct results.

    The good news is that OP has some fast alternatives to use while he/she works to improve their data processing chops. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The only time that performance takes a back seat for me is if accuracy is not otherwise able to be achieved.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/4/2016)


    The only time that performance takes a back seat for me is if accuracy is not otherwise able to be achieved.

    And you, sir, would be one of those "seasoned TSQL talents" that are an exception to my statement. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply