cumulative average without cursor

Question

cumulative average without cursor

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

Robert klimes (7/29/2015)
First, is the right answer 4.3333 ? I changed the data type in your table to decimal(5, 4), as the float data type would be unable to accurately represent any mark value that ended in point 1 (e.g. 4.1 or 3.1, etc.). If I read your post correctly, you appear to just need the MAX value of a mark in a given course for a given student and course across ALL terms, and then just average all those maximum marks, as there will be exactly 1 per course, regardless of the number of terms. If I have that right, then here's the simple quiery:
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
WITH MAX_MARKS AS (
SELECT G.student, G.course, MAX(G.mark) AS MAX_MARK
FROM #gpa AS G
GROUP BY G.student, G.course
)
SELECT G.student, MIN(G.term) AS STARTING_TERM, MAX(G.term) AS ENDING_TERM, AVG(M.MAX_MARK) AS AVERAGE_GPA,
STUFF(
(SELECT DISTINCT ', ' + CT.course
FROM #gpa AS CT
WHERE CT.student = G.student
ORDER BY 1
FOR XML PATH(''), TYPE).value('.', N'varchar(max)')
, 1, 2, '') AS CourseList
FROM #gpa AS G
INNER JOIN MAX_MARKS AS M
ON G.student = M.student
AND G.course = M.course
GROUP BY G.student
DROP TABLE #gpa
Here are the results:
student STARTING_TERM ENDING_TERM AVERAGE_GPA CourseList
----------- ------------- ----------- --------------------------------------- --------------------------------------------------
1 1 3 4.333333 anth, bio, chem, comp, eng, hist, math, soc
the example I provided gave the desired results. I dont just need a final GPA but I need a cumulative GPA per term including the max mark for each course upto that term.
In the example data GPA for term 1 is 4 (3:math,4:eng,4.5:hist,4.5:anth), GPA for term 2 is 4.14285.... (4;eng,4.5:hist,4.5:anth,3:comp,4:math,4.5:bio,4.5:chem), only the math with the highest grade is included. GPA for term 3 is 4.25 (4;eng,4.5:hist,4.5:anth,3:comp,4.5:math,4.5:bio,4.5:chem, 4.5:soc), only the math from term 3 and bio+chem from term two are used.
term1 3+4+4.5+4.5 = 16/4(courses) = 4 GPA
term2 4+4.5+4.5+3+4+4.5+4.5 = 29/7(courses) = 4.14285.... GPA
term3 4+4.5+4.5+3+4.5+4.5+4.5+4.5 = 34/8(courses) = 4.25 GPA

Okay, this is a quite different requirement from what was stated in the original post, but a variation of my query should work. I was trying to test that last night on my 2012 instance and realized I had forgotten to post my newest code, so then I had nothing to work with because I had the newest code on a 2008 R2 environment at work. Anyway, here's the proposed and un-tested 2012 code. If it fails, get rid of the CourseList field and see if it will still run::

CREATE TABLE #gpa (

[student] int,

[term] int,

[course] varchar(10),

[mark] decimal(5,4)

);

INSERT INTO #gpa (student, term, course, mark)

VALUES (1,1,'math',3),

(1,1,'eng',4),

(1,1,'hist',4.5),

(1,1,'anth',4.5),

(1,2,'comp',3),

(1,2,'math',4),

(1,2,'bio',4.5),

(1,2,'chem',4.5),

(1,3,'soc',4.5),

(1,3,'math',4.5),

(1,3,'bio',4),

(1,3,'chem',4);

WITH MAX_MARKS AS (

SELECT G.student, G.course, G.term, MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK

FROM #gpa AS G

GROUP BY G.student, G.course, G.term

)

SELECT G.student, G.term, AVG(M.MAX_MARK) OVER(PARTITION BY G.student ORDER BY G.term) AS AVERAGE_GPA,

STUFF(

(SELECT DISTINCT ', ' + CT.course

FROM #gpa AS CT

WHERE CT.student = G.student

ORDER BY 1

FOR XML PATH(''), TYPE).value('.', N'varchar(max)')

, 1, 2, '') AS CourseList

FROM #gpa AS G

INNER JOIN MAX_MARKS AS M

ON G.student = M.student

AND G.course = M.course

GROUP BY G.student, G.term

DROP TABLE #gpa

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Robert Klimes SSCoach Points: 17865 More actions · Answer 2

I'd like to thank everyone for their contributions.

Even if supplied solution did not give desired results, they did give me different ways of looking at problems and , in the end, that is just as helpful because i will be able to use those strategies in future problems.

Thanks all :w00t:

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

paul.s.lach Old Hand Points: 329 More actions · Answer 3

Try the following, the concept is:

1. do a cross join of all courses and terms

2. left join the results of the cross join to the marks you supplied such that every mark for every course appears in every term from the term where taken forward

3. use ROW_NUMBER() to determine the highest mark

4. calculate the GPA

IF OBJECT_ID('tempdb..#GPA') IS NOT NULL DROP TABLE #GPA;

CREATE TABLE [#GPA]([term] int,

[course] varchar(10),

[mark] float);

INSERT INTO [#GPA]

VALUES (1,'math',3),

(1,'eng',4),

(1,'hist',4.5),

(1,'anth',4.5),

(2,'comp',3),

(2,'math',4),

(2,'bio',4.5),

(2,'chem',4.5),

(3,'soc',4.5),

(3,'math',4.5),

(3,'bio',4),

(3,'chem',4);

WITH [Courses] ([course])

AS (SELECT DISTINCT [course]

FROM [#GPA]

),

[Terms] ([term])

AS (SELECT DISTINCT [term]

FROM [#GPA]

),

[CourseTerms] ([term], [course])

AS (SELECT [t].[term], [c].[course]

FROM [Courses] AS c CROSS JOIN [Terms] AS t

),

[Marks] ([term], [course], [mark])

AS (SELECT [ct].[term], [ct].[course], [g1].[mark]

FROM [CourseTerms] AS ct LEFT OUTER JOIN [#GPA] AS [g1] ON [g1].[term] <= [ct].[term]

AND [g1].[course] = [ct].[course]

),

[OrderedMarks] ([term], [course], [mark], SeqNo)

AS (

SELECT [m].[term], [m].[course], [m].[mark]

, ROW_NUMBER() OVER (PARTITION BY [m].[term], [m].[course] ORDER BY [m].[mark] DESC) AS [SeqNo]

FROM [Marks] AS m

),

[BestMarks] ([term], [course], [mark])

AS (

SELECT [om].[term], [om].[course], [om].[mark]

FROM [OrderedMarks] AS om

WHERE [om].[SeqNo] = 1

)

SELECT [bm].[term]

, SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN [bm].[mark] ELSE 0 END) / CAST(SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN 1 ELSE 0 END) AS FLOAT) AS [GPA]

FROM [BestMarks] AS bm

GROUP BY [bm].[term]

ORDER BY [bm].[term];

sgmunson SSC Guru Points: 110640 More actions · Answer 4

Robert klimes (7/30/2015)
I'd like to thank everyone for their contributions.
Even if supplied solution did not give desired results, they did give me different ways of looking at problems and , in the end, that is just as helpful because i will be able to use those strategies in future problems.
Thanks all :w00t:

I finally have something that works. Not necessarily elegant and I have no idea what it will perform like, but the results are what is expected.

Here's the code:

CREATE TABLE #gpa (

[student] int,

[term] int,

[course] varchar(10),

[mark] decimal(5,4)

);

INSERT INTO #gpa (student, term, course, mark)

VALUES (1,1,'math',3),

(1,1,'eng',4),

(1,1,'hist',4.5),

(1,1,'anth',4.5),

(1,2,'comp',3),

(1,2,'math',4),

(1,2,'bio',4.5),

(1,2,'chem',4.5),

(1,3,'soc',4.5),

(1,3,'math',4.5),

(1,3,'bio',4),

(1,3,'chem',4);

SELECT *

FROM #gpa

ORDER BY student, term, course;

WITH MAX_MARKS AS (

SELECT DISTINCT G.student, G.course, G.term,

MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK

FROM #gpa AS G

),

MARKS_COUNT AS (

SELECT G.student, G.term, G.course,

CASE WHEN G.term = 1 THEN 1 ELSE 0 END AS VALID1,

CASE

WHEN G.term = 1 THEN

CASE

WHEN NOT EXISTS (

SELECT 1

FROM gpa AS G2

WHERE G2.student = G.student

AND G2.course = G.course

AND G2.term = G.term + 1

AND G2.mark >= G.mark) THEN 1

ELSE 0

END

WHEN G.term = 2 THEN

CASE

WHEN NOT EXISTS (

SELECT 1

FROM gpa AS G1

WHERE G1.student = G.student

AND G1.course = G.course

AND G1.term = G.term - 1

AND G1.mark >= G.mark) THEN 1

ELSE 0

END

WHEN G.term = 3 THEN 0

END AS VALID2,

CASE

WHEN G.term < 3 THEN

CASE

WHEN NOT EXISTS (

SELECT 1

FROM gpa AS GA

WHERE GA.student = G.student

AND GA.course = G.course

AND GA.term <> G.term

AND GA.mark > G.mark) THEN 1

ELSE 0

END

ELSE

CASE

WHEN NOT EXISTS (

SELECT 1

FROM gpa AS G12

WHERE G12.student = G.student

AND G12.course = G.course

AND G12.term < G.term

AND G12.mark >= G.mark) THEN 1

ELSE 0

END

END AS VALID3

FROM #gpa AS G

)

SELECT DISTINCT M.student, M.term,

CASE M.term

WHEN 1 THEN SUM(M.VALID1 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)

WHEN 2 THEN SUM(M.VALID2 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)

WHEN 3 THEN SUM(M.VALID3 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)

END /

CASE M.term

WHEN 1 THEN SUM(M.VALID1) OVER(PARTITION BY M.student)

WHEN 2 THEN SUM(M.VALID2) OVER(PARTITION BY M.student)

WHEN 3 THEN SUM(M.VALID3) OVER(PARTITION BY M.student)

END AS AVERAGE_GPA

FROM MARKS_COUNT AS M

INNER JOIN MAX_MARKS AS G

ON M.student = G.student

AND M.term = G.term

AND M.course = G.course

DROP TABLE #gpa

Here's the results:

studenttermAVERAGE_GPA

114

124.142857

134.25

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Robert Klimes SSCoach Points: 17865 More actions · Answer 5

paul.s.lach (7/30/2015)
Try the following, the concept is:
1. do a cross join of all courses and terms
2. left join the results of the cross join to the marks you supplied such that every mark for every course appears in every term from the term where taken forward
3. use ROW_NUMBER() to determine the highest mark
4. calculate the GPA
IF OBJECT_ID('tempdb..#GPA') IS NOT NULL DROP TABLE #GPA;
CREATE TABLE [#GPA]([term] int,
[course] varchar(10),
[mark] float);
INSERT INTO [#GPA]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
WITH [Courses] ([course])
AS (SELECT DISTINCT [course]
FROM [#GPA]
),
[Terms] ([term])
AS (SELECT DISTINCT [term]
FROM [#GPA]
),
[CourseTerms] ([term], [course])
AS (SELECT [t].[term], [c].[course]
FROM [Courses] AS c CROSS JOIN [Terms] AS t
),
[Marks] ([term], [course], [mark])
AS (SELECT [ct].[term], [ct].[course], [g1].[mark]
FROM [CourseTerms] AS ct LEFT OUTER JOIN [#GPA] AS [g1] ON [g1].[term] <= [ct].[term]
AND [g1].[course] = [ct].[course]
),
[OrderedMarks] ([term], [course], [mark], SeqNo)
AS (
SELECT [m].[term], [m].[course], [m].[mark]
, ROW_NUMBER() OVER (PARTITION BY [m].[term], [m].[course] ORDER BY [m].[mark] DESC) AS [SeqNo]
FROM [Marks] AS m
),
[BestMarks] ([term], [course], [mark])
AS (
SELECT [om].[term], [om].[course], [om].[mark]
FROM [OrderedMarks] AS om
WHERE [om].[SeqNo] = 1
)
SELECT [bm].[term]
, SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN [bm].[mark] ELSE 0 END) / CAST(SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN 1 ELSE 0 END) AS FLOAT) AS [GPA]
FROM [BestMarks] AS bm
GROUP BY [bm].[term]
ORDER BY [bm].[term];

Thanks Paul, this solution does give the correct results.

The solution provided by Jacob earlier, which also give correct results, would probably perform better on a larger scale. Jacob's solution does 4 reads to the source table while your solution does 28 for the sample data.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Robert Klimes SSCoach Points: 17865 More actions · Answer 6

I finally have something that works. Not necessarily elegant and I have no idea what it will perform like, but the results are what is expected.
Here's the code:
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
SELECT *
FROM #gpa
ORDER BY student, term, course;
WITH MAX_MARKS AS (
SELECT DISTINCT G.student, G.course, G.term,
MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK
FROM #gpa AS G
),
MARKS_COUNT AS (
SELECT G.student, G.term, G.course,
CASE WHEN G.term = 1 THEN 1 ELSE 0 END AS VALID1,
CASE
WHEN G.term = 1 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G2
WHERE G2.student = G.student
AND G2.course = G.course
AND G2.term = G.term + 1
AND G2.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 2 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G1
WHERE G1.student = G.student
AND G1.course = G.course
AND G1.term = G.term - 1
AND G1.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 3 THEN 0
END AS VALID2,
CASE
WHEN G.term < 3 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS GA
WHERE GA.student = G.student
AND GA.course = G.course
AND GA.term <> G.term
AND GA.mark > G.mark) THEN 1
ELSE 0
END
ELSE
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G12
WHERE G12.student = G.student
AND G12.course = G.course
AND G12.term < G.term
AND G12.mark >= G.mark) THEN 1
ELSE 0
END
END AS VALID3
FROM #gpa AS G
)
SELECT DISTINCT M.student, M.term,
CASE M.term
WHEN 1 THEN SUM(M.VALID1 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 2 THEN SUM(M.VALID2 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 3 THEN SUM(M.VALID3 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
END /
CASE M.term
WHEN 1 THEN SUM(M.VALID1) OVER(PARTITION BY M.student)
WHEN 2 THEN SUM(M.VALID2) OVER(PARTITION BY M.student)
WHEN 3 THEN SUM(M.VALID3) OVER(PARTITION BY M.student)
END AS AVERAGE_GPA
FROM MARKS_COUNT AS M
INNER JOIN MAX_MARKS AS G
ON M.student = G.student
AND M.term = G.term
AND M.course = G.course
DROP TABLE #gpa
Here's the results:
studenttermAVERAGE_GPA
114
124.142857
134.25

Thanks Steve, this also produces correct results. This solution also would probably perform worse than Jacob's on a larger scale. It does 33 reads to the source table as well as 158 reads to a work table vs 4 reads and no work table.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

This should be sent off to Itzik Ben-Gan as a puzzler to work out for an article on SQLMag.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

paul.s.lach Old Hand Points: 329 More actions · Answer 8

OK, here is another solution that I believe is stripped down to the basics:

INSERT INTO [#gpa]

VALUES (1,'math',3),

(1,'eng',4),

(1,'hist',4.5),

(1,'anth',4.5),

(2,'comp',3),

(2,'math',4),

(2,'bio',4.5),

(2,'chem',4.5),

(3,'soc',4.5),

(3,'math',4.5),

(3,'bio',4),

(3,'chem',4);

SELECT [m].[term], SUM([m].[MaxMark]) / CAST(COUNT(DISTINCT [m].[course]) AS FLOAT) AS [GPA]

FROM (SELECT [t].[term], [g].[course], MAX([g].[mark]) AS [MaxMark]

FROM [#gpa] AS [g] INNER JOIN (SELECT [term] FROM [#gpa] GROUP BY [term]) AS t ON [g].[term] <= [t].[term]

GROUP BY [t].[term], [g].[course]

) AS m

GROUP BY [m].[term]

ORDER BY [m].[term];

DROP TABLE [#gpa];

The execution plan is the same as the one from Jacob's solution, but without the CTE and (IMHO :-)) a more readable query.

sgmunson SSC Guru Points: 110640 More actions · Answer 9

paul.s.lach (7/31/2015)
OK, here is another solution that I believe is stripped down to the basics:
INSERT INTO [#gpa]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
SELECT [m].[term], SUM([m].[MaxMark]) / CAST(COUNT(DISTINCT [m].[course]) AS FLOAT) AS [GPA]
FROM (SELECT [t].[term], [g].[course], MAX([g].[mark]) AS [MaxMark]
FROM [#gpa] AS [g] INNER JOIN (SELECT [term] FROM [#gpa] GROUP BY [term]) AS t ON [g].[term] <= [t].[term]
GROUP BY [t].[term], [g].[course]
) AS m
GROUP BY [m].[term]
ORDER BY [m].[term];
DROP TABLE [#gpa];
The execution plan is the same as the one from Jacob's solution, but without the CTE and (IMHO :-)) a more readable query.

Cool beans... I had initially thought that just working with the max value would function correctly, but I couldn't quite get it to work. Beautiful solution, but let's "prettify the code and extend it to work with multiple students, shall we?

CREATE TABLE #gpa (

[student] int,

[term] int,

[course] varchar(10),

[mark] decimal(5,4)

);

INSERT INTO #gpa (student, term, course, mark)

VALUES(1, 1, 'math', 3),

(1, 1, 'eng', 4),

(1, 1, 'hist', 4.5),

(1, 1, 'anth', 4.5),

(1, 2, 'comp', 3),

(1, 2, 'math', 4),

(1, 2, 'bio', 4.5),

(1, 2, 'chem', 4.5),

(1, 3, 'soc', 4.5),

(1, 3, 'math', 4.5),

(1, 3, 'bio', 4),

(1, 3, 'chem', 4),

(2, 1, 'math', 3),

(2, 1, 'eng', 4),

(2, 1, 'hist', 4.5),

(2, 1, 'anth', 4.5),

(2, 2, 'comp', 3),

(2, 2, 'math', 4),

(2, 2, 'bio', 4.5),

(2, 2, 'chem', 4.5),

(2, 3, 'soc', 4.9),

(2, 3, 'math', 4.9),

(2, 3, 'bio', 4.9),

(2, 3, 'chem', 4.9);

SELECT *

FROM #gpa

ORDER BY student, term, course;

SET STATISTICS IO ON;

SELECT M.student, M.term,

SUM(M.MaxMark) /

CAST(COUNT(DISTINCT M.course) AS decimal(7,4)) AS GPA

FROM (

SELECT T.student, T.term, G.course,

MAX(G.mark) AS MaxMark

FROM #gpa AS G

INNER JOIN (

SELECT student, term

FROM #gpa

GROUP BY student, term

) AS T

ON G.student = T.student

AND G.term <= T.term

GROUP BY T.student, T.term, G.course

) AS M

GROUP BY M.student, M.term

ORDER BY M.student, M.term;

SET STATISTICS IO OFF;

DROP TABLE #gpa

Here's the results:

(24 row(s) affected)

student term course mark

----------- ----------- ---------- ---------------------------------------

1 1 anth 4.5000

1 1 eng 4.0000

1 1 hist 4.5000

1 1 math 3.0000

1 2 bio 4.5000

1 2 chem 4.5000

1 2 comp 3.0000

1 2 math 4.0000

1 3 bio 4.0000

1 3 chem 4.0000

1 3 math 4.5000

1 3 soc 4.5000

2 1 anth 4.5000

2 1 eng 4.0000

2 1 hist 4.5000

2 1 math 3.0000

2 2 bio 4.5000

2 2 chem 4.5000

2 2 comp 3.0000

2 2 math 4.0000

2 3 bio 4.9000

2 3 chem 4.9000

2 3 math 4.9000

2 3 soc 4.9000

(24 row(s) affected)

student term GPA

----------- ----------- ---------------------------------------

1 1 4.000000

1 2 4.142857

1 3 4.250000

2 1 4.000000

2 2 4.142857

2 3 4.450000

(6 row(s) affected)

Table '#gpa'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0,

lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)