count of consecutive value

Question

count of consecutive value

dubem1-878067

Ten Centuries

Points: 1044
More actions
January 29, 2009 at 8:22 pm

#127143

Not easy to explain, I hope I will make it clear.
I have a table with a column containing a result code.
I want to know what was the latest result code and the count of consecutive result code until result code differ
Here some examples, assume there is an order by showing rows by their insert date (most recent first)
id result code
--- -----------
7 A
6 A
5 A
4 C
3 A
2 B
1 A
the query should then return A and 3 (because A is the most recent result code and there is 3 consecutive A
id result code
--- -----------
7 B
6 A
5 B
4 B
3 B
2 B
1 A
The query should return B and 1
id result code
--- -----------
7 B
6 B
5 B
4 B
3 C
2 C
1 C
The query should return B and 4
and there a way to do this??
thank you
Martin

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

I'll trade ya... 😉 Tell me the business reason behind this and I'll give you some code to do it. The reason for the "trade" is, I'm writting an article on things like this and I think the business reasons are important for the article. My problem has been that everytime I give someone the code first, I never hear from them afterwards.

Things like what the table actually holds an why, from a business rule standpoint, something like this needs to be done would be very helpful.

Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dubem1-878067 Ten Centuries Points: 1044 More actions · Answer 2

Sure

I have a Student table in relation with an exam table. So each student has done n exams.

Each exam has a final cote (the result code in my initial post)

I want to measure the performance of each student, following the rules mentionned in my initial post.

Thank you

Martin

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

Thanks, Martin... if someone doesn't beat me to it, I'll give it a whirl after work.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

dubem1 (1/30/2009)
Sure
I have a Student table in relation with an exam table. So each student has done n exams.
Each exam has a final cote (the result code in my initial post)
I want to measure the performance of each student, following the rules mentionned in my initial post.
Thank you
Martin

One more question... I can understand why you need the latest ID for each, but why do you need to know how many rows were in the sequence that ends with that ID?

Anyway, here's the code... details are in the comments...

USE TempDb --Just a safe place to run things

GO

--DROP TABLE #MyHead, dbo.YourTable

GO

--===== Populate the test table (note the clustered PK)

-- (Not part of the solution)

CREATE TABLE dbo.YourTable

(

ID INT,

ResultCode CHAR(1),

CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)

)

--===== Populate the test table

-- (Not part of the solution)

INSERT INTO dbo.YourTable

(ID,ResultCode)

SELECT '7','A' UNION ALL

SELECT '6','A' UNION ALL

SELECT '5','A' UNION ALL

SELECT '4','C' UNION ALL

SELECT '3','A' UNION ALL

SELECT '2','B' UNION ALL

SELECT '1','A'

--===== Display the content of the test table

SELECT * FROM dbo.YourTable

------------------------------------------------------------

-- Solution

------------------------------------------------------------

--===== Copy the test table to a temporary table where we

-- can easily add a column...

SELECT ISNULL(ID,0) AS ID, --ISNULL give the column a NOT NULL constraint

ResultCode,

CAST(0 AS INT) AS RunningCount

INTO #MyHead

FROM dbo.YourTable

--===== Add a clustered index (PK in this case)

ALTER TABLE #MyHead

ADD PRIMARY KEY CLUSTERED (ID)

--===== Declare some obviously named variables

DECLARE @RunningCount INT,

@PrevResultCode CHAR(1)

--===== Do a little magic update to "rank" the rows in a way that

-- ROW_NUMBER() and RANK() can't do.

UPDATE #MyHead

SET @RunningCount = RunningCount = CASE WHEN @PrevResultCode = ResultCode THEN @RunningCount+1 ELSE 1 END,

@PrevResultCode = ResultCode

FROM #MyHead WITH (INDEX(0))

--===== Display the results after the update

SELECT * FROM #MyHead ORDER BY ID

--===== Display the desired results

SELECT t.ID,

t.ResultCode,

t.RunningCount

FROM #MyHead t

INNER JOIN

(--==== Find MAX ID for each ResultCode

SELECT MAX(ID) AS MaxID, ResultCode

FROM #MyHead

GROUP BY ResultCode

)d

ON t.ID = d.MaxID

AND t.ResultCode = d.ResultCode

ORDER BY t.ResultCode

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dubem1-878067 Ten Centuries Points: 1044 More actions · Answer 5

It's for statistics purpose to analyse the tendencies of students.

Thanks for the solution, it's not simple, I will try it.

Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.

They use something like this :

ROW_NUMBER() OVER(PARTITION BY student

ORDER BY dt, slot) -

I don't know if it can solve my problem. If you are curious,here is the link to the article :

http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

thanks again

Martin

Jack Corbett SSC Guru Points: 184394 More actions · Answer 6

You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.

The only way to become a Jedi like Jeff you need to test the solutions yourself.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

dubem1 (1/31/2009)
It's for statistics purpose to analyse the tendencies of students.
Thanks for the solution, it's not simple, I will try it.
Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.
They use something like this :
ROW_NUMBER() OVER(PARTITION BY student
ORDER BY dt, slot) -
I don't know if it can solve my problem. If you are curious,here is the link to the article :
http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html
thanks again
Martin

Thanks, again... I guess my real question would be, why do you need a count of the last sequence for each student and not the other sequences for the same student? What is that count called and what will it be used for? You say its for statistics... what is that statistic called and what is it's purpose? Thanks for being patient with my questions, Martin.

Shirfting gerars... Yes, Row_Number is quite handy... but I believe you'll find that there's nothing to restart the counts on the Result Code in your table which makes it just about impossible to use the simplicity of Row_Number or Rank for this particular problem.

So far as the complexity of the solution goes, about a 3rd of it is setting up YOUR test data so you can play with the code. Folks like me prefer that you provide the code for a test table and the insert commands to populate it... I did that for you, this time. See the link in my signature for more details on how to provide data during your posts. It's got some other goodies in it, too.

What my solution does (other than setting up your test data and displaying interim results) is very simple... I copy the data from "yourtable" to a temp table that has an extra column for the count you want and so I can ensure the code has total control over the data and indexes. I add a clustered index on the ID and it has to be clustered to control the order of the update. Then, I do a simple update on the new column that does just like you would manually... if the result code changes, the count restarts at 1... if it doesn't, it counts up by one.

Then, I find the last group (highest ID, actually) for each result code and use that to display the highest count for that result code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Jack Corbett (1/31/2009)
You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.
The only way to become a Jedi like Jeff you need to test the solutions yourself.

Heh... thanks for the confidence, Jack...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

I took a look at the winning solution for that contest... pretty clever.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.

Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.

Here's the results...

7936 ms - Quirky update

7390 ms - Winner's code

I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.

Lordy, I love it when I learn something new! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 11

By the way... if anyone else wants to study the code, the original "puzzle" question and two sets of data are available at the following URL...

http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html

... and like I've done in the past, they use a Numbers or Tally table to gen the larger data set.

Some of the solutions, including the winning solution I'm so amazed at (actually, there were two that used the same technique), are located at the following URL...

http://www.sqlmag.com/Articles/ArticleID/93606/93606.html?Ad=1

It's good stuff, folks... give it a read...

For those that may need an SQL Server 2000 solution that performs (heh) almost :Pas well, here's what the code I wrote looked like...

--===== Move the data to where we can work on it and add

-- A row number in the process.

SELECT IDENTITY(INT,1,1) AS RowNum,

Student, Dt, Slot, Attend,

CAST(0 AS INT) AS Span

INTO #MyHead

FROM dbo.Attendance

ORDER BY Student, Dt, Slot

--===== Add the clustered index for the quirky update to use (PK in this case)

ALTER TABLE #MyHead

ADD PRIMARY KEY CLUSTERED (Student, Dt, Slot)

--===== Declare a couple of variables for the quirky update using

-- some self evident names.

DECLARE @PrevStudent VARCHAR(10),

@PrevAttend BIT,

@PrevSpan INT

--===== Do the quirky update to form the "groups" (spans)

UPDATE #MyHead

SET @PrevSpan = Span = CASE WHEN @PrevStudent = Student

AND @PrevAttend = Attend

THEN @PrevSpan

ELSE ISNULL(@PrevSpan,0) + 1

END,

@PrevAttend = Attend,

@PrevStudent = Student

FROM #MyHead WITH (INDEX(0),TABLOCKX)

OPTION (MAXDOP 1)

--===== Do the final select using the groups (spans)

SELECT lo.Student AS Student,

lo.Dt AS From_Dt, lo.Slot AS From_Slot,

hi.Dt AS To_Dt , hi.Slot AS To_Slot,

hi.Attend,

span.Cnt

FROM (--==== Derived table "span" get's the start and end mark

-- for each "span" of data

SELECT Student,

MIN(RowNum) AS MinRowNum,

MAX(RowNum) AS MaxRowNum,

COUNT(*) AS Cnt

FROM #MyHead

GROUP BY Student, Span

)span

INNER JOIN #MyHead lo ON span.MinRowNum = lo.RowNum

INNER JOIN #MyHead hi ON span.MaxRowNum = hi.RowNum

ORDER BY Student, From_Dt, From_Slot

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 12

Ok... coming full circle back to Martin's original problem...

Here's the test data...

USE TempDb --Just a safe place to run things

GO

--DROP TABLE dbo.YourTable

GO

--===== Populate the test table (note the clustered PK)

-- (Not part of the solution)

CREATE TABLE dbo.YourTable

(

ID INT,

ResultCode CHAR(1),

CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)

)

--===== Populate the test table

-- (Not part of the solution)

INSERT INTO dbo.YourTable

(ID,ResultCode)

SELECT '7','A' UNION ALL

SELECT '6','A' UNION ALL

SELECT '5','A' UNION ALL

SELECT '4','C' UNION ALL

SELECT '3','A' UNION ALL

SELECT '2','B' UNION ALL

SELECT '1','A'

--===== Display the content of the test table

SELECT * FROM dbo.YourTable

... and here's that new fancy-dancy span resolver to solve the original problem. Had to add an extra CTE to get just the last span and I also added the answer to what should be the next question... what are the Min and Max ID's for the last span of each ResultCode...

;WITH CteSpan AS

(--==== Creates groups ordered by ID breaking at ResultCode

SELECT Resultcode, ID,

CAST(ResultCode AS BINARY(8)) + CAST(ID AS BINARY(4)) AS ResultCode_ID,

ROW_NUMBER() OVER (ORDER BY ID) --Just in case there's missing ID's

- ROW_NUMBER() OVER(PARTITION BY ResultCode ORDER BY ResultCode,ID) AS SpanNum

FROM dbo.YourTable --order by id

)

, cteCnt AS

(--==== Gets the ID's, the count, the max combined Resultcode/ID for each span

SELECT Resultcode,

MIN(ID) AS MinID,

MAX(ID) AS MaxID,

MAX(ResultCode_ID) AS MaxResultCode_ID,

COUNT(*) AS Cnt

FROM CteSpan

GROUP BY Resultcode, SpanNum

)

, cteMax AS

(--==== Extra CTE just to get the last span for each Resultcode

SELECT Resultcode,

MAX(ResultCode_ID) AS MaxResultCode_ID

FROM CteSpan

GROUP BY Resultcode

)

SELECT cnt.MinID, cnt.MaxID, cnt.Resultcode, cnt.Cnt

FROM cteCnt cnt

INNER JOIN cteMax mx ON cnt.MaxResultCode_ID = mx.MaxResultCode_ID

ORDER BY cnt.ResultCode

Heh... see? I told ya I loved it when I learned something new. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

Jeff Moden (1/31/2009)
Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.

Heh, I agree 100% Jeff. I am stupefied at the cleverness of this mathematical trick, (and PO'd at myself for not having figured it out for myself before now). I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions.

Cool stuff. I sure hope that I remember to use this trick in the future...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

RBarryYoung (1/31/2009)
...I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions...

Oh, I see. It does repeat with the larger partition, but not for each unique value of Attend within each student. The group numbers for each Student+Attend sequence never repeat because they always increase. Nice...

Dang this is a nice trick! I so wish I knew about it years ago. 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]