Help Solving Running Total Problem

  • Lynn Pettis (4/20/2009)


    Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?

    Yes, there is a DateTime field as well as a Primary Key field. (Like CheckInID in this example)

    I have been using the Primary Key field for the sequence because it is indexed and the DateTime field is not.

  • Lynn Pettis (4/20/2009)


    I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.

    Yes, I read your article. 🙂 In fact, it was the first place I looked when I began to have some difficulties with this method. I don't think that table is partitioned. Is there a way to check?

    I haven't had an opportunity to revisit this yet, but I will try to this evening.

  • Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?

  • Jeffrey Williams (4/20/2009)


    I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?

    By using the primary key field. That's how the order is determined. There is a DateTime field as well, but I prefer using the primary key because it is indexed.

    I think this solution by Lynn would work if you created the ordering and partitioning.

    I think I might be missing something. Is there a way to know if a table is partitioned?

    On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.

    Here's what I'm trying to do. First I am doing a "quirky" update with a temp table. I am creating "groups" of data. The group counter increments when we start a new group. In this case a new user or a non-blank field. After that I am counting the rows to see if 3 blanks exist.

    Not sure if that makes sense.

  • I believe that this will do the job:

    SELECT Distinct UserID

    From (

    SELECT *

    , ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt

    From (

    SELECT CheckInID

    , UserID

    , comment

    , ROW_NUMBER() Over (partition by UserID order by CheckInID) -

    ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp

    FROM #CheckIn

    ) g

    Where Comment = ''

    ) U

    Where GrpCnt = 3

    (I am ecstatic that I got to use this trick before Jeff Moden 😀 ).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Lynn Pettis (4/20/2009)


    Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?

    As odd as this may sound, we are really pretty close to the actual problem.

    I have a table which is updated from a web application which is being used by thousands of users at once. Every time the user navigates a record is added to my table.

    My requirement is to first filter out certain data, and then find all users who have 3 blank records in a row. The only parts I've skipped out here is having extra filter conditions, and the actual nature of the data.

  • Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.

    For either solution to work - you need to identify the ordering for each group, and how each partition is defined.

    Once you have that defined, we can modify Lynn's row_number() so it works correctly.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Well, I am a little confused. Why isn't CheckInID sufficient as a primary key & ordering principle?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeffrey Williams (4/20/2009)


    Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.

    My bad. Here's the same table with a primary key.

    -- sample table

    CREATE TABLE #CheckIn

    (

    CheckInID INT IDENTITY(1,1) PRIMARY KEY

    ,UserID INT

    ,Comment VARCHAR(50)

    ,CheckInDate DATETIME DEFAULT(GetUtcDate())

    )

    For either solution to work - you need to identify the ordering for each group, and how each partition is defined.

    The order for each group is the UserID and then CheckInID (or CheckInDate).

    I don't know what you mean by partition here.

  • RBarryYoung (4/20/2009)


    I believe that this will do the job:

    SELECT Distinct UserID

    From (

    SELECT *

    , ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt

    From (

    SELECT CheckInID

    , UserID

    , comment

    , ROW_NUMBER() Over (partition by UserID order by CheckInID) -

    ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp

    FROM #CheckIn

    ) g

    Where Comment = ''

    ) U

    Where GrpCnt = 3

    (I am ecstatic that I got to use this trick before Jeff Moden 😀 ).

    Very cool! It works perfectly. Although I don't really get how it works. Would you explain?

    I just tested it on real data -- 1.6 million records.

    Your solution took 23 seconds, Bevan's solution took 3. (Not official times, just SSMS time from query window)

    I'm going to try to test it tommorrow on a larger data set.

  • Here is another attempt at solving your problem:

    CREATE TABLE #CheckIn

    (

    CheckInID INT IDENTITY(1,1)

    ,UserID INT

    ,Comment VARCHAR(50)

    )

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    UNION ALL SELECT 2, ''

    -- create a temp table for running total because we can't alter existing production table

    CREATE TABLE #OrderedCheckIn

    (

    CheckInIDINTNOT NULL

    ,UserIDINTNOT NULL

    ,CommentVARCHAR(50)

    ,GroupCounterINT

    )

    INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)

    SELECT CheckInID

    ,UserID

    ,Comment

    FROM #CheckIn

    --===== MUST have a clustered primary key to GUARANTEE this will work

    ALTER TABLE #OrderedCheckIn

    ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)

    --===== Create the required local variables

    DECLARE @user-id INT

    ,@GroupCounter INT

    ,@CheckInID INT

    SET @GroupCounter = 0;

    SET @user-id = -1;

    UPDATE #OrderedCheckIn

    SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)

    THEN @GroupCounter + 1

    ELSE 1 END

    ,@UserID = UserID

    ,@CheckInID = CheckInID

    FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!

    SELECT * --UserID

    FROM #OrderedCheckIn

    select

    UserID

    from

    #OrderedCheckIn

    group by

    UserID, GroupCounter

    having

    GroupCounter > 3;

    -- clean up

    DROP TABLE #OrderedCheckIn

    DROP TABLE #CheckIn

    Take a look and let me know.

  • Lynn: I can't get your solution to work correctly. GroupCounter is always 1 on my live data. I can't understand it.

    Barry: As I suspected, it looks like your solution out-performs Bevan's as the volume increases.

    On a table of 22+ million records, your solution took only 6 minutes.

    Last time I ran Bevan's on that same table it took more than one hour!

    I tried to run it now and it's been going for 15 minutes. I'm going to cancel it before I get a karate chop from my DBA...

    Also, Barry, would you be able to explain your query?

  • Hi Goldie,

    Do you have an index on UserID?

  • Goldie Graber (4/21/2009)


    Also, Barry, would you be able to explain your query?

    Sorry Goldie, not trying to be evasive but my brain isn't at 100% right now and this technique is hard to explain.

    Here is an article where it is used and explained by Itzik Ben-Gan: http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

    Here's another thread where it is used and Jeff and I talk about it. http://www.sqlservercentral.com/Forums/Topic646593-338-1.aspx

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Lynn Pettis (4/20/2009)


    Here is another attempt at solving your problem:

    CREATE TABLE #CheckIn

    (

    CheckInID INT IDENTITY(1,1)

    ,UserID INT

    ,Comment VARCHAR(50)

    )

    -- insert some sample data

    INSERT INTO #CheckIn (UserID, Comment)

    SELECT 1, 'A'

    UNION ALL SELECT 2, 'B'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'A'

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 3, 'B'

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 2, ''

    UNION ALL SELECT 1, ''

    UNION ALL SELECT 2, 'C'

    UNION ALL SELECT 2, ''

    -- create a temp table for running total because we can't alter existing production table

    CREATE TABLE #OrderedCheckIn

    (

    CheckInIDINTNOT NULL

    ,UserIDINTNOT NULL

    ,CommentVARCHAR(50)

    ,GroupCounterINT

    )

    INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)

    SELECT CheckInID

    ,UserID

    ,Comment

    FROM #CheckIn

    --===== MUST have a clustered primary key to GUARANTEE this will work

    ALTER TABLE #OrderedCheckIn

    ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)

    --===== Create the required local variables

    DECLARE @user-id INT

    ,@GroupCounter INT

    ,@CheckInID INT

    SET @GroupCounter = 0;

    SET @user-id = -1;

    UPDATE #OrderedCheckIn

    SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)

    THEN @GroupCounter + 1

    ELSE 1 END

    ,@UserID = UserID

    ,@CheckInID = CheckInID

    FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!

    SELECT * --UserID

    FROM #OrderedCheckIn

    select

    UserID

    from

    #OrderedCheckIn

    group by

    UserID, GroupCounter

    having

    GroupCounter > 3;

    -- clean up

    DROP TABLE #OrderedCheckIn

    DROP TABLE #CheckIn

    Take a look and let me know.

    I'm not sure why it isn't working for you. I run the above code as is and it works correctly with the test data.

    Can you post the DDL and code you are trying to run?

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply