Help Solving Running Total Problem

Question

Post reply

Help Solving Running Total Problem

Goldie Lesser

SSCertifiable

Points: 6155
More actions
April 7, 2009 at 4:39 pm

#130451

This is a sample for a larger problem I'm trying to solve.
I am looking for a list of users who have three consecutive blank comments.
I have tried using Row_Number and Jeff's "Quirky Update" but I just can't seem to figure it out.
Here's some sample DDL & DATA to work with:
-- sample table
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserIDINT
,Comment VARCHAR(50)
)
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
SELECT *
FROM #CheckIn
-- clean up
DROP TABLE #CheckIn
I would expect to see UserID 1 as a result.
Thanks!

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 1

I think this may have something to do with the amount of data being manipulated.

The table we are talking about contains 4 million to 10 million rows.

The update method seems to work on a subset of the data, but it seems to act weird when there are millions of rows.

Has anyone experienced this?

Bevan Keighley SSCommitted Points: 1825 More actions · Answer 2

Bevan Keighley

SSCommitted

Points: 1825

April 7, 2009 at 5:26 pm

#973903

Hi. What is your current update statement?

Bevan

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 3

Here's what I have now. It seems like it's working correctly.

However, when I try it on a real data table with 5 million rows it returns inconsistent results.

-- create a temp table for running total because we can't alter existing production table

CREATE TABLE #OrderedCheckIn

(

CheckInIDINTNOT NULL

,UserIDINTNOT NULL

,CommentVARCHAR(50)

,GroupCounterINT

)

INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)

SELECT CheckInID

,UserID

,Comment

FROM #CheckIn

--===== MUST have a clustered primary key to GUARANTEE this will work

ALTER TABLE #OrderedCheckIn

ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)

--===== Create the required local variables

DECLARE @user-id INT

,@GroupCounter INT

,@CheckInID INT

SET @GroupCounter = 0

UPDATE #OrderedCheckIn

SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)

THEN @GroupCounter

ELSE @GroupCounter + 1 END

,@UserID = UserID

,@CheckInID = CheckInID

FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!

SELECT UserID

FROM #OrderedCheckIn

GROUP BY GroupCounter

,UserID

HAVING COUNT(*) > 3

-- clean up

DROP TABLE #OrderedCheckIn

DROP TABLE #CheckIn

Bevan Keighley SSCommitted Points: 1825 More actions · Answer 4

hi Goldie,

Before you posted that code I was trying to solve this another way - by concatenating all the comments and then searching for three blanks in a row

with cte(UserID, Comments)

as

(select distinct a.userID ,(select ',' + Comment from #CheckIn b where a.userID = b.userID for XML path('')) as comments

from #CheckIn a

group by a.UserID)

select * from cte where Comments like '%,,,%'

Does that work for you? I am still looking at the other code you posted.

Bevan

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 5

Now THAT's one really smart idea!

With the addition of an ORDER BY and a trailing comma it works perfectly.

Here's the final code:

;WITH OrderedCheckIn AS

(

SELECT UserID

, (SELECT ',' + ISNULL(Comment,'')

FROM #CheckIn B

WHERE B.UserID = A.UserID

ORDER BY B.CheckInID

FOR XML PATH('')) + ',' AS AllComments

FROM #CheckIn A

GROUP BY UserID

)

SELECT UserID

FROM OrderedCheckIn

WHERE AllComments LIKE '%,,,,%'

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

You can also check this out:

--- sample table

CREATE TABLE #CheckIn

(

CheckInID INT IDENTITY(1,1)

,UserID INT

,Comment VARCHAR(50)

)

--- insert some sample data

INSERT INTO #CheckIn (UserID, Comment)

SELECT 1, 'A'

UNION ALL SELECT 2, 'B'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'A'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'B'

UNION ALL SELECT 2, ''

UNION ALL SELECT 1, ''

UNION ALL SELECT 2, 'C';

WITH BlankComments (

RowNum,

CheckInID,

UserID,

Comment

) as (

SELECT

row_number() over (partition by UserID, Comment order by UserID, Comment) as RowNum,

CheckInID,

UserID,

Comment

FROM #CheckIn

)

select

UserID,

Comment

from

BlankComments

where

RowNum = 3 and

Comment = ''

--- clean up

DROP TABLE #CheckIn

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 7

That won't work, Lynn. It doesn't take into account consecutive blanks.

For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.

-- insert some sample data

INSERT INTO #CheckIn (UserID, Comment)

SELECT 1, 'A'

UNION ALL SELECT 2, 'B'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'A'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'B'

UNION ALL SELECT 2, ''

UNION ALL SELECT 1, ''

UNION ALL SELECT 2, 'C'

UNION ALL SELECT 2, ''

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

Goldie Graber (4/8/2009)
That won't work, Lynn. It doesn't take into account consecutive blanks.
For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''

I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 9

Lynn Pettis (4/8/2009)
I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

There are not 3 consecutive blanks for UserID 2.

His Comments are 'B','','','C',''.

UserID 1 has 3 consecutive blanks

His Comments are 'A','','',''

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

I missed th 2 'C'. Sorry. Still looking at an alternative solution.

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 11

As a post script, I ended up using Bevan's method.

It works well on tables with up to a few million records. Takes a minute or two.

When I had to run it on a table of 22 million it took over an hour!!

We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.

On a different note, has anyone had trouble using Jeff's running total method before?

It seems to fail on tables with a few million records.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Goldie Graber (4/20/2009)
As a post script, I ended up using Bevan's method.
It works well on tables with up to a few million records. Takes a minute or two.
When I had to run it on a table of 22 million it took over an hour!!
We will need to run this code in our production environment every once in a while, so I would appreciate if someone could help tweak this code or come up with a different approach.
On a different note, has anyone had trouble using Jeff's running total method before?
It seems to fail on tables with a few million records.

I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.

I haven't had an opportunity to revisit this yet, but I will try to this evening.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 14

Lynn Pettis (4/8/2009)
Goldie Graber (4/8/2009)
That won't work, Lynn. It doesn't take into account consecutive blanks.
For example if I use the following data, I still expect 1 as the answer but your query returns 1 and 2.
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
I see three consecutive blanks for UserID 2. If you are taking the consecutive blanks AS ENTERED, then where are the three consecutive blanks in your original test data? I don't see any.

I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?

I think this solution by Lynn would work if you created the ordering and partitioning.

On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs