Help Solving Running Total Problem

Question

Help Solving Running Total Problem

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 1

Lynn Pettis (4/20/2009)
Is there more to the real world data that would support the sequence of data entered? is there a date field or other form of identification that signifies the order of entry into the table?

Yes, there is a DateTime field as well as a Primary Key field. (Like CheckInID in this example)

I have been using the Primary Key field for the sequence because it is indexed and the DateTime field is not.

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 2

Lynn Pettis (4/20/2009)
I have not had any trouble with the "quirky update" method and I have tested it on a test table of up to 10 million records with out any problems. The only problem that I am aware of with the "quirky update" method is on partitioned tables. I wrote an article that addressed that issue, and you can find a link to that article below in my signature block.

Yes, I read your article. 🙂 In fact, it was the first place I looked when I began to have some difficulties with this method. I don't think that table is partitioned. Is there a way to check?

I haven't had an opportunity to revisit this yet, but I will try to this evening.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 4

Jeffrey Williams (4/20/2009)
I see how you have defined this as 3 consecutive rows - but, how is the system supposed to do that? What columns define the order such that the row 2 'B' comes before row 2 'C' and the rows with 2 and a blank somehow fall between?

By using the primary key field. That's how the order is determined. There is a DateTime field as well, but I prefer using the primary key because it is indexed.

I think this solution by Lynn would work if you created the ordering and partitioning.

I think I might be missing something. Is there a way to know if a table is partitioned?

On another note, I think the problem with your 'quirky update' method is the final select statement. You are looking for rows where you have a count(*) > 3 - but I don't see where the count is adjusting when you find a blank.

Here's what I'm trying to do. First I am doing a "quirky" update with a temp table. I am creating "groups" of data. The group counter increments when we start a new group. In this case a new user or a non-blank field. After that I am counting the rows to see if 3 blanks exist.

Not sure if that makes sense.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

I believe that this will do the job:

SELECT Distinct UserID

From (

SELECT *

, ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt

From (

SELECT CheckInID

, UserID

, comment

, ROW_NUMBER() Over (partition by UserID order by CheckInID) -

ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp

FROM #CheckIn

) g

Where Comment = ''

) U

Where GrpCnt = 3

(I am ecstatic that I got to use this trick before Jeff Moden 😀 ).

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 6

Lynn Pettis (4/20/2009)
Sounds like you need to redefine your problem somewhat. How about providing something closer to the actual problem we are working on. Can you give me something closer to what you really have to work with?

As odd as this may sound, we are really pretty close to the actual problem.

I have a table which is updated from a web application which is being used by thousands of users at once. Every time the user navigates a record is added to my table.

My requirement is to first filter out certain data, and then find all users who have 3 blank records in a row. The only parts I've skipped out here is having extra filter conditions, and the actual nature of the data.

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 7

Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.

For either solution to work - you need to identify the ordering for each group, and how each partition is defined.

Once you have that defined, we can modify Lynn's row_number() so it works correctly.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

Well, I am a little confused. Why isn't CheckInID sufficient as a primary key & ordering principle?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 9

Jeffrey Williams (4/20/2009)
Based upon your sample data - you don't have a primary key. The examples given here don't include a primary key definition.

My bad. Here's the same table with a primary key.

-- sample table

CREATE TABLE #CheckIn

(

CheckInID INT IDENTITY(1,1) PRIMARY KEY

,UserID INT

,Comment VARCHAR(50)

,CheckInDate DATETIME DEFAULT(GetUtcDate())

)

For either solution to work - you need to identify the ordering for each group, and how each partition is defined.

The order for each group is the UserID and then CheckInID (or CheckInDate).

I don't know what you mean by partition here.

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 10

RBarryYoung (4/20/2009)
I believe that this will do the job:
SELECT Distinct UserID
From (
SELECT *
, ROW_NUMBER() Over(partition by UserId, Comment, CmtTypeGrp order by CheckInID) as GrpCnt
From (
SELECT CheckInID
, UserID
, comment
, ROW_NUMBER() Over (partition by UserID order by CheckInID) -
ROW_NUMBER() Over (partition by UserID, Comment order by CheckInID) as CmtTypeGrp
FROM #CheckIn
) g
Where Comment = ''
) U
Where GrpCnt = 3
(I am ecstatic that I got to use this trick before Jeff Moden 😀 ).

Very cool! It works perfectly. Although I don't really get how it works. Would you explain?

I just tested it on real data -- 1.6 million records.

Your solution took 23 seconds, Bevan's solution took 3. (Not official times, just SSMS time from query window)

I'm going to try to test it tommorrow on a larger data set.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

Here is another attempt at solving your problem:

CREATE TABLE #CheckIn

(

CheckInID INT IDENTITY(1,1)

,UserID INT

,Comment VARCHAR(50)

)

-- insert some sample data

INSERT INTO #CheckIn (UserID, Comment)

SELECT 1, 'A'

UNION ALL SELECT 2, 'B'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'A'

UNION ALL SELECT 1, ''

UNION ALL SELECT 3, 'B'

UNION ALL SELECT 2, ''

UNION ALL SELECT 1, ''

UNION ALL SELECT 2, 'C'

UNION ALL SELECT 2, ''

-- create a temp table for running total because we can't alter existing production table

CREATE TABLE #OrderedCheckIn

(

CheckInIDINTNOT NULL

,UserIDINTNOT NULL

,CommentVARCHAR(50)

,GroupCounterINT

)

INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)

SELECT CheckInID

,UserID

,Comment

FROM #CheckIn

--===== MUST have a clustered primary key to GUARANTEE this will work

ALTER TABLE #OrderedCheckIn

ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)

--===== Create the required local variables

DECLARE @user-id INT

,@GroupCounter INT

,@CheckInID INT

SET @GroupCounter = 0;

SET @user-id = -1;

UPDATE #OrderedCheckIn

SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)

THEN @GroupCounter + 1

ELSE 1 END

,@UserID = UserID

,@CheckInID = CheckInID

FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!

SELECT * --UserID

FROM #OrderedCheckIn

select

UserID

from

#OrderedCheckIn

group by

UserID, GroupCounter

having

GroupCounter > 3;

-- clean up

DROP TABLE #OrderedCheckIn

DROP TABLE #CheckIn

Take a look and let me know.

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 12

Lynn: I can't get your solution to work correctly. GroupCounter is always 1 on my live data. I can't understand it.

Barry: As I suspected, it looks like your solution out-performs Bevan's as the volume increases.

On a table of 22+ million records, your solution took only 6 minutes.

Last time I ran Bevan's on that same table it took more than one hour!

I tried to run it now and it's been going for 15 minutes. I'm going to cancel it before I get a karate chop from my DBA...

Also, Barry, would you be able to explain your query?

Bevan Keighley SSCommitted Points: 1825 More actions · Answer 13

Bevan Keighley

SSCommitted

Points: 1825

April 21, 2009 at 4:03 pm

#981856

Hi Goldie,

Do you have an index on UserID?

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

Goldie Graber (4/21/2009)
Also, Barry, would you be able to explain your query?

Sorry Goldie, not trying to be evasive but my brain isn't at 100% right now and this technique is hard to explain.

Here is an article where it is used and explained by Itzik Ben-Gan: http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

Here's another thread where it is used and Jeff and I talk about it. http://www.sqlservercentral.com/Forums/Topic646593-338-1.aspx

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 15

Lynn Pettis (4/20/2009)
Here is another attempt at solving your problem:
CREATE TABLE #CheckIn
(
CheckInID INT IDENTITY(1,1)
,UserID INT
,Comment VARCHAR(50)
)
-- insert some sample data
INSERT INTO #CheckIn (UserID, Comment)
SELECT 1, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 1, ''
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 2, ''
UNION ALL SELECT 2, ''
UNION ALL SELECT 1, ''
UNION ALL SELECT 2, 'C'
UNION ALL SELECT 2, ''
-- create a temp table for running total because we can't alter existing production table
CREATE TABLE #OrderedCheckIn
(
CheckInIDINTNOT NULL
,UserIDINTNOT NULL
,CommentVARCHAR(50)
,GroupCounterINT
)
INSERT INTO #OrderedCheckIn (CheckInID, UserID, Comment)
SELECT CheckInID
,UserID
,Comment
FROM #CheckIn
--===== MUST have a clustered primary key to GUARANTEE this will work
ALTER TABLE #OrderedCheckIn
ADD PRIMARY KEY CLUSTERED (UserID, CheckInID)
--===== Create the required local variables
DECLARE @user-id INT
,@GroupCounter INT
,@CheckInID INT
SET @GroupCounter = 0;
SET @user-id = -1;
UPDATE #OrderedCheckIn
SET @GroupCounter = GroupCounter = CASE WHEN @user-id = UserID AND (Comment = '' OR Comment IS NULL)
THEN @GroupCounter + 1
ELSE 1 END
,@UserID = UserID
,@CheckInID = CheckInID
FROM #OrderedCheckIn WITH(INDEX(0)) --LOOK! Must refer to the clustered index or could bomb!
SELECT * --UserID
FROM #OrderedCheckIn
select
UserID
from
#OrderedCheckIn
group by
UserID, GroupCounter
having
GroupCounter > 3;
-- clean up
DROP TABLE #OrderedCheckIn
DROP TABLE #CheckIn
Take a look and let me know.

I'm not sure why it isn't working for you. I run the above code as is and it works correctly with the test data.

Can you post the DDL and code you are trying to run?