Help Solving Running Total Problem

  • This is a sample for a larger problem I'm trying to solve.

    I am looking for a list of users who have three consecutive blank comments.

    I have tried using Row_Number and Jeff's "Quirky Update" but I just can't seem to figure it out.

    Here's some sample DDL & DATA to work with:

    -- sample table

    CREATE TABLE #CheckIn

    (

    CheckInID INT IDENTITY(1,1)

    ,UserIDINT

    ,Comment VARCHAR(50)

    )

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    SELECT *

    FROM #CheckIn

    -- clean up

    DROP TABLE #CheckIn

    I would expect to see UserID 1 as a result.

    Thanks!

  • I think this may have something to do with the amount of data being manipulated.

    The table we are talking about contains 4 million to 10 million rows.

    The update method seems to work on a subset of the data, but it seems to act weird when there are millions of rows.

    Has anyone experienced this?

  • Hi. What is your current update statement?

    Bevan

  • Here's what I have now. It seems like it's working correctly.

    However, when I try it on a real data table with 5 million rows it returns inconsistent results.

    -- create a temp table for running total because we can't alter existing production table

    CREATE TABLE #OrderedCheckIn

    (

    CheckInIDINTNOT NULL

    ,UserIDINTNOT NULL

    ,CommentVARCHAR(50)

    ,GroupCounterINT

    )

    INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)

    SELECT CheckInID

    ,UserID

    ,Comment

    FROM #CheckIn

    --===== MUST have a clustered primary key to GUARANTEE this will work

    ALTER TABLE #OrderedCheckIn

    ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)

    --===== Create the required local variables

    DECLARE @user-id INT

    ,@GroupCounter INT

    ,@CheckInID INT

    SET @GroupCounter = 0

    UPDATE #OrderedCheckIn

    SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)

    THEN @GroupCounter

    ELSE @GroupCounter + 1 END

    ,@UserID = UserID

    ,@CheckInID = CheckInID

    FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!

    SELECT UserID

    FROM #OrderedCheckIn

    GROUP BY GroupCounter

    ,UserID

    HAVING COUNT(*) > 3

    -- clean up

    DROP TABLE #OrderedCheckIn

    DROP TABLE #CheckIn

  • hi Goldie,

    Before you posted that code I was trying to solve this another way - by concatenating all the comments and then searching for three blanks in a row

    with cte(UserID, Comments)

    as

    (select distinct a.userID ,(select ',' + Comment from #CheckIn b where a.userID = b.userID for XML path('')) as comments

    from #CheckIn a

    group by a.UserID)

    select * from cte where Comments like '%,,,%'

    Does that work for you? I am still looking at the other code you posted.

    Bevan

  • Now THAT's one really smart idea!

    With the addition of an ORDER BY and a trailing comma it works perfectly.

    Here's the final code:

    ;WITH OrderedCheckIn AS

    (

    SELECT UserID

    , (SELECT ',' + ISNULL(Comment,'')

    FROM #CheckIn B

    WHERE B.UserID = A.UserID

    ORDER BY B.CheckInID

    FOR XML PATH('')) + ',' AS AllComments

    FROM #CheckIn A

    GROUP BY UserID

    )

    SELECT UserID

    FROM OrderedCheckIn

    WHERE AllComments LIKE '%,,,,%'

  • You can also check this out:

    --- sample table

    CREATE TABLE #CheckIn

    (

    CheckInID INT IDENTITY(1,1)

    ,UserID INT

    ,Comment VARCHAR(50)

    )

    --- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C';

    WITH BlankComments (

    RowNum,

    CheckInID,

    UserID,

    Comment

    ) as (

    SELECT

    row_number() over (partition by UserID, Comment order by UserID, Comment) as RowNum,

    CheckInID,

    UserID,

    Comment

    FROM #CheckIn

    )

    select

    UserID,

    Comment

    from

    BlankComments

    where

    RowNum = 3 and

    Comment = ''

    --- clean up

    DROP TABLE #CheckIn

  • That won't work, Lynn. It doesn't take into account consecutive blanks.

    For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    UNION ALL SELECT 2, ''

  • Goldie Graber (4/8/2009)


    That won't work, Lynn. It doesn't take into account consecutive blanks.

    For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    UNION ALL SELECT 2, ''

    I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

  • Lynn Pettis (4/8/2009)


    I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

    There are not 3 consecutive blanks for UserID 2.

    His Comments are 'B','','','C',''.

    UserID 1 has 3 consecutive blanks

    His Comments are 'A','','',''

  • I missed th 2 'C'. Sorry. Still looking at an alternative solution.

  • As a post script, I ended up using Bevan's method.

    It works well on tables with up to a few million records. Takes a minute or two.

    When I had to run it on a table of 22 million it took over an hour!!

    We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.

    On a different note, has anyone had trouble using Jeff's running total method before?

    It seems to fail on tables with a few million records.

  • Goldie Graber (4/20/2009)


    As a post script, I ended up using Bevan's method.

    It works well on tables with up to a few million records. Takes a minute or two.

    When I had to run it on a table of 22 million it took over an hour!!

    We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.

    On a different note, has anyone had trouble using Jeff's running total method before?

    It seems to fail on tables with a few million records.

    I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.

    I haven't had an opportunity to revisit this yet, but I will try to this evening.

  • Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?

  • Lynn Pettis (4/8/2009)


    Goldie Graber (4/8/2009)


    That won't work, Lynn. It doesn't take into account consecutive blanks.

    For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    UNION ALL SELECT 2, ''

    I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

    I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?

    I think this solution by Lynn would work if you created the ordering and partitioning.

    On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply