January 29, 2009 at 8:22 pm
Not easy to explain, I hope I will make it clear.
I have a table with a column containing a result code.
I want to know what was the latest result code and the count of consecutive result code until result code differ
Here some examples, assume there is an order by showing rows by their insert date (most recent first)
id result code
--- -----------
7 A
6 A
5 A
4 C
3 A
2 B
1 A
the query should then return A and 3 (because A is the most recent result code and there is 3 consecutive A
id result code
--- -----------
7 B
6 A
5 B
4 B
3 B
2 B
1 A
The query should return B and 1
id result code
--- -----------
7 B
6 B
5 B
4 B
3 C
2 C
1 C
The query should return B and 4
and there a way to do this??
thank you
Martin
January 29, 2009 at 10:09 pm
I'll trade ya... 😉 Tell me the business reason behind this and I'll give you some code to do it. The reason for the "trade" is, I'm writting an article on things like this and I think the business reasons are important for the article. My problem has been that everytime I give someone the code first, I never hear from them afterwards.
Things like what the table actually holds an why, from a business rule standpoint, something like this needs to be done would be very helpful.
Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 30, 2009 at 7:48 am
Sure
I have a Student table in relation with an exam table. So each student has done n exams.
Each exam has a final cote (the result code in my initial post)
I want to measure the performance of each student, following the rules mentionned in my initial post.
Thank you
Martin
January 30, 2009 at 9:43 am
Thanks, Martin... if someone doesn't beat me to it, I'll give it a whirl after work.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 30, 2009 at 7:22 pm
dubem1 (1/30/2009)
SureI have a Student table in relation with an exam table. So each student has done n exams.
Each exam has a final cote (the result code in my initial post)
I want to measure the performance of each student, following the rules mentionned in my initial post.
Thank you
Martin
One more question... I can understand why you need the latest ID for each, but why do you need to know how many rows were in the sequence that ends with that ID?
Anyway, here's the code... details are in the comments...
USE TempDb --Just a safe place to run things
GO
--DROP TABLE #MyHead, dbo.YourTable
GO
--===== Populate the test table (note the clustered PK)
-- (Not part of the solution)
CREATE TABLE dbo.YourTable
(
ID INT,
ResultCode CHAR(1),
CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)
)
--===== Populate the test table
-- (Not part of the solution)
INSERT INTO dbo.YourTable
(ID,ResultCode)
SELECT '7','A' UNION ALL
SELECT '6','A' UNION ALL
SELECT '5','A' UNION ALL
SELECT '4','C' UNION ALL
SELECT '3','A' UNION ALL
SELECT '2','B' UNION ALL
SELECT '1','A'
--===== Display the content of the test table
SELECT * FROM dbo.YourTable
------------------------------------------------------------
-- Solution
------------------------------------------------------------
--===== Copy the test table to a temporary table where we
-- can easily add a column...
SELECT ISNULL(ID,0) AS ID, --ISNULL give the column a NOT NULL constraint
ResultCode,
CAST(0 AS INT) AS RunningCount
INTO #MyHead
FROM dbo.YourTable
--===== Add a clustered index (PK in this case)
ALTER TABLE #MyHead
ADD PRIMARY KEY CLUSTERED (ID)
--===== Declare some obviously named variables
DECLARE @RunningCount INT,
@PrevResultCode CHAR(1)
--===== Do a little magic update to "rank" the rows in a way that
-- ROW_NUMBER() and RANK() can't do.
UPDATE #MyHead
SET @RunningCount = RunningCount = CASE WHEN @PrevResultCode = ResultCode THEN @RunningCount+1 ELSE 1 END,
@PrevResultCode = ResultCode
FROM #MyHead WITH (INDEX(0))
--===== Display the results after the update
SELECT * FROM #MyHead ORDER BY ID
--===== Display the desired results
SELECT t.ID,
t.ResultCode,
t.RunningCount
FROM #MyHead t
INNER JOIN
(--==== Find MAX ID for each ResultCode
SELECT MAX(ID) AS MaxID, ResultCode
FROM #MyHead
GROUP BY ResultCode
)d
ON t.ID = d.MaxID
AND t.ResultCode = d.ResultCode
ORDER BY t.ResultCode
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 7:24 am
It's for statistics purpose to analyse the tendencies of students.
Thanks for the solution, it's not simple, I will try it.
Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.
They use something like this :
ROW_NUMBER() OVER(PARTITION BY student
ORDER BY dt, slot) -
I don't know if it can solve my problem. If you are curious,here is the link to the article :
http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html
thanks again
Martin
January 31, 2009 at 7:36 am
You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.
The only way to become a Jedi like Jeff you need to test the solutions yourself.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 31, 2009 at 8:59 am
dubem1 (1/31/2009)
It's for statistics purpose to analyse the tendencies of students.Thanks for the solution, it's not simple, I will try it.
Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.
They use something like this :
ROW_NUMBER() OVER(PARTITION BY student
ORDER BY dt, slot) -
I don't know if it can solve my problem. If you are curious,here is the link to the article :
http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html
thanks again
Martin
Thanks, again... I guess my real question would be, why do you need a count of the last sequence for each student and not the other sequences for the same student? What is that count called and what will it be used for? You say its for statistics... what is that statistic called and what is it's purpose? Thanks for being patient with my questions, Martin.
Shirfting gerars... Yes, Row_Number is quite handy... but I believe you'll find that there's nothing to restart the counts on the Result Code in your table which makes it just about impossible to use the simplicity of Row_Number or Rank for this particular problem.
So far as the complexity of the solution goes, about a 3rd of it is setting up YOUR test data so you can play with the code. Folks like me prefer that you provide the code for a test table and the insert commands to populate it... I did that for you, this time. See the link in my signature for more details on how to provide data during your posts. It's got some other goodies in it, too.
What my solution does (other than setting up your test data and displaying interim results) is very simple... I copy the data from "yourtable" to a temp table that has an extra column for the count you want and so I can ensure the code has total control over the data and indexes. I add a clustered index on the ID and it has to be clustered to control the order of the update. Then, I do a simple update on the new column that does just like you would manually... if the result code changes, the count restarts at 1... if it doesn't, it counts up by one.
Then, I find the last group (highest ID, actually) for each result code and use that to display the highest count for that result code.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 9:11 am
Jack Corbett (1/31/2009)
You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.The only way to become a Jedi like Jeff you need to test the solutions yourself.
Heh... thanks for the confidence, Jack...
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 12:53 pm
I took a look at the winning solution for that contest... pretty clever.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 1:47 pm
Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.
Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.
Here's the results...
7936 ms - Quirky update
7390 ms - Winner's code
I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.
Lordy, I love it when I learn something new! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 2:12 pm
By the way... if anyone else wants to study the code, the original "puzzle" question and two sets of data are available at the following URL...
http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html
... and like I've done in the past, they use a Numbers or Tally table to gen the larger data set.
Some of the solutions, including the winning solution I'm so amazed at (actually, there were two that used the same technique), are located at the following URL...
http://www.sqlmag.com/Articles/ArticleID/93606/93606.html?Ad=1
It's good stuff, folks... give it a read...
For those that may need an SQL Server 2000 solution that performs (heh) almost :Pas well, here's what the code I wrote looked like...
--===== Move the data to where we can work on it and add
-- A row number in the process.
SELECT IDENTITY(INT,1,1) AS RowNum,
Student, Dt, Slot, Attend,
CAST(0 AS INT) AS Span
INTO #MyHead
FROM dbo.Attendance
ORDER BY Student, Dt, Slot
--===== Add the clustered index for the quirky update to use (PK in this case)
ALTER TABLE #MyHead
ADD PRIMARY KEY CLUSTERED (Student, Dt, Slot)
--===== Declare a couple of variables for the quirky update using
-- some self evident names.
DECLARE @PrevStudent VARCHAR(10),
@PrevAttend BIT,
@PrevSpan INT
--===== Do the quirky update to form the "groups" (spans)
UPDATE #MyHead
SET @PrevSpan = Span = CASE WHEN @PrevStudent = Student
AND @PrevAttend = Attend
THEN @PrevSpan
ELSE ISNULL(@PrevSpan,0) + 1
END,
@PrevAttend = Attend,
@PrevStudent = Student
FROM #MyHead WITH (INDEX(0),TABLOCKX)
OPTION (MAXDOP 1)
--===== Do the final select using the groups (spans)
SELECT lo.Student AS Student,
lo.Dt AS From_Dt, lo.Slot AS From_Slot,
hi.Dt AS To_Dt , hi.Slot AS To_Slot,
hi.Attend,
span.Cnt
FROM (--==== Derived table "span" get's the start and end mark
-- for each "span" of data
SELECT Student,
MIN(RowNum) AS MinRowNum,
MAX(RowNum) AS MaxRowNum,
COUNT(*) AS Cnt
FROM #MyHead
GROUP BY Student, Span
)span
INNER JOIN #MyHead lo ON span.MinRowNum = lo.RowNum
INNER JOIN #MyHead hi ON span.MaxRowNum = hi.RowNum
ORDER BY Student, From_Dt, From_Slot
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 3:18 pm
Ok... coming full circle back to Martin's original problem...
Here's the test data...
USE TempDb --Just a safe place to run things
GO
--DROP TABLE dbo.YourTable
GO
--===== Populate the test table (note the clustered PK)
-- (Not part of the solution)
CREATE TABLE dbo.YourTable
(
ID INT,
ResultCode CHAR(1),
CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)
)
--===== Populate the test table
-- (Not part of the solution)
INSERT INTO dbo.YourTable
(ID,ResultCode)
SELECT '7','A' UNION ALL
SELECT '6','A' UNION ALL
SELECT '5','A' UNION ALL
SELECT '4','C' UNION ALL
SELECT '3','A' UNION ALL
SELECT '2','B' UNION ALL
SELECT '1','A'
--===== Display the content of the test table
SELECT * FROM dbo.YourTable
... and here's that new fancy-dancy span resolver to solve the original problem. Had to add an extra CTE to get just the last span and I also added the answer to what should be the next question... what are the Min and Max ID's for the last span of each ResultCode...
;WITH CteSpan AS
(--==== Creates groups ordered by ID breaking at ResultCode
SELECT Resultcode, ID,
CAST(ResultCode AS BINARY(8)) + CAST(ID AS BINARY(4)) AS ResultCode_ID,
ROW_NUMBER() OVER (ORDER BY ID) --Just in case there's missing ID's
- ROW_NUMBER() OVER(PARTITION BY ResultCode ORDER BY ResultCode,ID) AS SpanNum
FROM dbo.YourTable --order by id
)
, cteCnt AS
(--==== Gets the ID's, the count, the max combined Resultcode/ID for each span
SELECT Resultcode,
MIN(ID) AS MinID,
MAX(ID) AS MaxID,
MAX(ResultCode_ID) AS MaxResultCode_ID,
COUNT(*) AS Cnt
FROM CteSpan
GROUP BY Resultcode, SpanNum
)
, cteMax AS
(--==== Extra CTE just to get the last span for each Resultcode
SELECT Resultcode,
MAX(ResultCode_ID) AS MaxResultCode_ID
FROM CteSpan
GROUP BY Resultcode
)
SELECT cnt.MinID, cnt.MaxID, cnt.Resultcode, cnt.Cnt
FROM cteCnt cnt
INNER JOIN cteMax mx ON cnt.MaxResultCode_ID = mx.MaxResultCode_ID
ORDER BY cnt.ResultCode
Heh... see? I told ya I loved it when I learned something new. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2009 at 5:14 pm
Jeff Moden (1/31/2009)
Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.
Heh, I agree 100% Jeff. I am stupefied at the cleverness of this mathematical trick, (and PO'd at myself for not having figured it out for myself before now). I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions.
Cool stuff. I sure hope that I remember to use this trick in the future...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 31, 2009 at 5:25 pm
RBarryYoung (1/31/2009)
...I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions...
Oh, I see. It does repeat with the larger partition, but not for each unique value of Attend within each student. The group numbers for each Student+Attend sequence never repeat because they always increase. Nice...
Dang this is a nice trick! I so wish I knew about it years ago. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 15 posts - 1 through 15 (of 44 total)
You must be logged in to reply to this topic. Login to reply