July 30, 2015 at 6:21 am
Robert klimes (7/29/2015)
First, is the right answer 4.3333 ? I changed the data type in your table to decimal(5, 4), as the float data type would be unable to accurately represent any mark value that ended in point 1 (e.g. 4.1 or 3.1, etc.). If I read your post correctly, you appear to just need the MAX value of a mark in a given course for a given student and course across ALL terms, and then just average all those maximum marks, as there will be exactly 1 per course, regardless of the number of terms. If I have that right, then here's the simple quiery:
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
WITH MAX_MARKS AS (
SELECT G.student, G.course, MAX(G.mark) AS MAX_MARK
FROM #gpa AS G
GROUP BY G.student, G.course
)
SELECT G.student, MIN(G.term) AS STARTING_TERM, MAX(G.term) AS ENDING_TERM, AVG(M.MAX_MARK) AS AVERAGE_GPA,
STUFF(
(SELECT DISTINCT ', ' + CT.course
FROM #gpa AS CT
WHERE CT.student = G.student
ORDER BY 1
FOR XML PATH(''), TYPE).value('.', N'varchar(max)')
, 1, 2, '') AS CourseList
FROM #gpa AS G
INNER JOIN MAX_MARKS AS M
ON G.student = M.student
AND G.course = M.course
GROUP BY G.student
DROP TABLE #gpa
Here are the results:
student STARTING_TERM ENDING_TERM AVERAGE_GPA CourseList
----------- ------------- ----------- --------------------------------------- --------------------------------------------------
1 1 3 4.333333 anth, bio, chem, comp, eng, hist, math, soc
the example I provided gave the desired results. I dont just need a final GPA but I need a cumulative GPA per term including the max mark for each course upto that term.
In the example data GPA for term 1 is 4 (3:math,4:eng,4.5:hist,4.5:anth), GPA for term 2 is 4.14285.... (4;eng,4.5:hist,4.5:anth,3:comp,4:math,4.5:bio,4.5:chem), only the math with the highest grade is included. GPA for term 3 is 4.25 (4;eng,4.5:hist,4.5:anth,3:comp,4.5:math,4.5:bio,4.5:chem, 4.5:soc), only the math from term 3 and bio+chem from term two are used.
term1 3+4+4.5+4.5 = 16/4(courses) = 4 GPA
term2 4+4.5+4.5+3+4+4.5+4.5 = 29/7(courses) = 4.14285.... GPA
term3 4+4.5+4.5+3+4.5+4.5+4.5+4.5 = 34/8(courses) = 4.25 GPA
Okay, this is a quite different requirement from what was stated in the original post, but a variation of my query should work. I was trying to test that last night on my 2012 instance and realized I had forgotten to post my newest code, so then I had nothing to work with because I had the newest code on a 2008 R2 environment at work. Anyway, here's the proposed and un-tested 2012 code. If it fails, get rid of the CourseList field and see if it will still run::
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
WITH MAX_MARKS AS (
SELECT G.student, G.course, G.term, MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK
FROM #gpa AS G
GROUP BY G.student, G.course, G.term
)
SELECT G.student, G.term, AVG(M.MAX_MARK) OVER(PARTITION BY G.student ORDER BY G.term) AS AVERAGE_GPA,
STUFF(
(SELECT DISTINCT ', ' + CT.course
FROM #gpa AS CT
WHERE CT.student = G.student
ORDER BY 1
FOR XML PATH(''), TYPE).value('.', N'varchar(max)')
, 1, 2, '') AS CourseList
FROM #gpa AS G
INNER JOIN MAX_MARKS AS M
ON G.student = M.student
AND G.course = M.course
GROUP BY G.student, G.term
DROP TABLE #gpa
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 30, 2015 at 7:54 am
I'd like to thank everyone for their contributions.
Even if supplied solution did not give desired results, they did give me different ways of looking at problems and , in the end, that is just as helpful because i will be able to use those strategies in future problems.
Thanks all :w00t:
July 30, 2015 at 1:51 pm
Try the following, the concept is:
1. do a cross join of all courses and terms
2. left join the results of the cross join to the marks you supplied such that every mark for every course appears in every term from the term where taken forward
3. use ROW_NUMBER() to determine the highest mark
4. calculate the GPA
IF OBJECT_ID('tempdb..#GPA') IS NOT NULL DROP TABLE #GPA;
CREATE TABLE [#GPA]([term] int,
[course] varchar(10),
[mark] float);
INSERT INTO [#GPA]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
WITH [Courses] ([course])
AS (SELECT DISTINCT [course]
FROM [#GPA]
),
[Terms] ([term])
AS (SELECT DISTINCT [term]
FROM [#GPA]
),
[CourseTerms] ([term], [course])
AS (SELECT [t].[term], [c].[course]
FROM [Courses] AS c CROSS JOIN [Terms] AS t
),
[Marks] ([term], [course], [mark])
AS (SELECT [ct].[term], [ct].[course], [g1].[mark]
FROM [CourseTerms] AS ct LEFT OUTER JOIN [#GPA] AS [g1] ON [g1].[term] <= [ct].[term]
AND [g1].[course] = [ct].[course]
),
[OrderedMarks] ([term], [course], [mark], SeqNo)
AS (
SELECT [m].[term], [m].[course], [m].[mark]
, ROW_NUMBER() OVER (PARTITION BY [m].[term], [m].[course] ORDER BY [m].[mark] DESC) AS [SeqNo]
FROM [Marks] AS m
),
[BestMarks] ([term], [course], [mark])
AS (
SELECT [om].[term], [om].[course], [om].[mark]
FROM [OrderedMarks] AS om
WHERE [om].[SeqNo] = 1
)
SELECT [bm].[term]
, SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN [bm].[mark] ELSE 0 END) / CAST(SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN 1 ELSE 0 END) AS FLOAT) AS [GPA]
FROM [BestMarks] AS bm
GROUP BY [bm].[term]
ORDER BY [bm].[term];
July 30, 2015 at 2:30 pm
Robert klimes (7/30/2015)
I'd like to thank everyone for their contributions.Even if supplied solution did not give desired results, they did give me different ways of looking at problems and , in the end, that is just as helpful because i will be able to use those strategies in future problems.
Thanks all :w00t:
I finally have something that works. Not necessarily elegant and I have no idea what it will perform like, but the results are what is expected.
Here's the code:
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
SELECT *
FROM #gpa
ORDER BY student, term, course;
WITH MAX_MARKS AS (
SELECT DISTINCT G.student, G.course, G.term,
MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK
FROM #gpa AS G
),
MARKS_COUNT AS (
SELECT G.student, G.term, G.course,
CASE WHEN G.term = 1 THEN 1 ELSE 0 END AS VALID1,
CASE
WHEN G.term = 1 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G2
WHERE G2.student = G.student
AND G2.course = G.course
AND G2.term = G.term + 1
AND G2.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 2 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G1
WHERE G1.student = G.student
AND G1.course = G.course
AND G1.term = G.term - 1
AND G1.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 3 THEN 0
END AS VALID2,
CASE
WHEN G.term < 3 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS GA
WHERE GA.student = G.student
AND GA.course = G.course
AND GA.term <> G.term
AND GA.mark > G.mark) THEN 1
ELSE 0
END
ELSE
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G12
WHERE G12.student = G.student
AND G12.course = G.course
AND G12.term < G.term
AND G12.mark >= G.mark) THEN 1
ELSE 0
END
END AS VALID3
FROM #gpa AS G
)
SELECT DISTINCT M.student, M.term,
CASE M.term
WHEN 1 THEN SUM(M.VALID1 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 2 THEN SUM(M.VALID2 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 3 THEN SUM(M.VALID3 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
END /
CASE M.term
WHEN 1 THEN SUM(M.VALID1) OVER(PARTITION BY M.student)
WHEN 2 THEN SUM(M.VALID2) OVER(PARTITION BY M.student)
WHEN 3 THEN SUM(M.VALID3) OVER(PARTITION BY M.student)
END AS AVERAGE_GPA
FROM MARKS_COUNT AS M
INNER JOIN MAX_MARKS AS G
ON M.student = G.student
AND M.term = G.term
AND M.course = G.course
DROP TABLE #gpa
Here's the results:
studenttermAVERAGE_GPA
114
124.142857
134.25
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
July 30, 2015 at 2:34 pm
paul.s.lach (7/30/2015)
Try the following, the concept is:1. do a cross join of all courses and terms
2. left join the results of the cross join to the marks you supplied such that every mark for every course appears in every term from the term where taken forward
3. use ROW_NUMBER() to determine the highest mark
4. calculate the GPA
IF OBJECT_ID('tempdb..#GPA') IS NOT NULL DROP TABLE #GPA;
CREATE TABLE [#GPA]([term] int,
[course] varchar(10),
[mark] float);
INSERT INTO [#GPA]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
WITH [Courses] ([course])
AS (SELECT DISTINCT [course]
FROM [#GPA]
),
[Terms] ([term])
AS (SELECT DISTINCT [term]
FROM [#GPA]
),
[CourseTerms] ([term], [course])
AS (SELECT [t].[term], [c].[course]
FROM [Courses] AS c CROSS JOIN [Terms] AS t
),
[Marks] ([term], [course], [mark])
AS (SELECT [ct].[term], [ct].[course], [g1].[mark]
FROM [CourseTerms] AS ct LEFT OUTER JOIN [#GPA] AS [g1] ON [g1].[term] <= [ct].[term]
AND [g1].[course] = [ct].[course]
),
[OrderedMarks] ([term], [course], [mark], SeqNo)
AS (
SELECT [m].[term], [m].[course], [m].[mark]
, ROW_NUMBER() OVER (PARTITION BY [m].[term], [m].[course] ORDER BY [m].[mark] DESC) AS [SeqNo]
FROM [Marks] AS m
),
[BestMarks] ([term], [course], [mark])
AS (
SELECT [om].[term], [om].[course], [om].[mark]
FROM [OrderedMarks] AS om
WHERE [om].[SeqNo] = 1
)
SELECT [bm].[term]
, SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN [bm].[mark] ELSE 0 END) / CAST(SUM(CASE WHEN [bm].[mark] IS NOT NULL THEN 1 ELSE 0 END) AS FLOAT) AS [GPA]
FROM [BestMarks] AS bm
GROUP BY [bm].[term]
ORDER BY [bm].[term];
Thanks Paul, this solution does give the correct results.
The solution provided by Jacob earlier, which also give correct results, would probably perform better on a larger scale. Jacob's solution does 4 reads to the source table while your solution does 28 for the sample data.
July 30, 2015 at 3:12 pm
I finally have something that works. Not necessarily elegant and I have no idea what it will perform like, but the results are what is expected.
Here's the code:
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES (1,1,'math',3),
(1,1,'eng',4),
(1,1,'hist',4.5),
(1,1,'anth',4.5),
(1,2,'comp',3),
(1,2,'math',4),
(1,2,'bio',4.5),
(1,2,'chem',4.5),
(1,3,'soc',4.5),
(1,3,'math',4.5),
(1,3,'bio',4),
(1,3,'chem',4);
SELECT *
FROM #gpa
ORDER BY student, term, course;
WITH MAX_MARKS AS (
SELECT DISTINCT G.student, G.course, G.term,
MAX(G.mark) OVER(PARTITION BY G.student, G.course ORDER BY G.term) AS MAX_MARK
FROM #gpa AS G
),
MARKS_COUNT AS (
SELECT G.student, G.term, G.course,
CASE WHEN G.term = 1 THEN 1 ELSE 0 END AS VALID1,
CASE
WHEN G.term = 1 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G2
WHERE G2.student = G.student
AND G2.course = G.course
AND G2.term = G.term + 1
AND G2.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 2 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G1
WHERE G1.student = G.student
AND G1.course = G.course
AND G1.term = G.term - 1
AND G1.mark >= G.mark) THEN 1
ELSE 0
END
WHEN G.term = 3 THEN 0
END AS VALID2,
CASE
WHEN G.term < 3 THEN
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS GA
WHERE GA.student = G.student
AND GA.course = G.course
AND GA.term <> G.term
AND GA.mark > G.mark) THEN 1
ELSE 0
END
ELSE
CASE
WHEN NOT EXISTS (
SELECT 1
FROM gpa AS G12
WHERE G12.student = G.student
AND G12.course = G.course
AND G12.term < G.term
AND G12.mark >= G.mark) THEN 1
ELSE 0
END
END AS VALID3
FROM #gpa AS G
)
SELECT DISTINCT M.student, M.term,
CASE M.term
WHEN 1 THEN SUM(M.VALID1 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 2 THEN SUM(M.VALID2 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
WHEN 3 THEN SUM(M.VALID3 * G.MAX_MARK) OVER (PARTITION BY M.student ORDER BY M.term)
END /
CASE M.term
WHEN 1 THEN SUM(M.VALID1) OVER(PARTITION BY M.student)
WHEN 2 THEN SUM(M.VALID2) OVER(PARTITION BY M.student)
WHEN 3 THEN SUM(M.VALID3) OVER(PARTITION BY M.student)
END AS AVERAGE_GPA
FROM MARKS_COUNT AS M
INNER JOIN MAX_MARKS AS G
ON M.student = G.student
AND M.term = G.term
AND M.course = G.course
DROP TABLE #gpa
Here's the results:
studenttermAVERAGE_GPA
114
124.142857
134.25
Thanks Steve, this also produces correct results. This solution also would probably perform worse than Jacob's on a larger scale. It does 33 reads to the source table as well as 158 reads to a work table vs 4 reads and no work table.
July 30, 2015 at 8:48 pm
This should be sent off to Itzik Ben-Gan as a puzzler to work out for an article on SQLMag.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 31, 2015 at 6:33 am
OK, here is another solution that I believe is stripped down to the basics:
INSERT INTO [#gpa]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
SELECT [m].[term], SUM([m].[MaxMark]) / CAST(COUNT(DISTINCT [m].[course]) AS FLOAT) AS [GPA]
FROM (SELECT [t].[term], [g].[course], MAX([g].[mark]) AS [MaxMark]
FROM [#gpa] AS [g] INNER JOIN (SELECT [term] FROM [#gpa] GROUP BY [term]) AS t ON [g].[term] <= [t].[term]
GROUP BY [t].[term], [g].[course]
) AS m
GROUP BY [m].[term]
ORDER BY [m].[term];
DROP TABLE [#gpa];
The execution plan is the same as the one from Jacob's solution, but without the CTE and (IMHO :-)) a more readable query.
July 31, 2015 at 9:59 am
paul.s.lach (7/31/2015)
OK, here is another solution that I believe is stripped down to the basics:
INSERT INTO [#gpa]
VALUES (1,'math',3),
(1,'eng',4),
(1,'hist',4.5),
(1,'anth',4.5),
(2,'comp',3),
(2,'math',4),
(2,'bio',4.5),
(2,'chem',4.5),
(3,'soc',4.5),
(3,'math',4.5),
(3,'bio',4),
(3,'chem',4);
SELECT [m].[term], SUM([m].[MaxMark]) / CAST(COUNT(DISTINCT [m].[course]) AS FLOAT) AS [GPA]
FROM (SELECT [t].[term], [g].[course], MAX([g].[mark]) AS [MaxMark]
FROM [#gpa] AS [g] INNER JOIN (SELECT [term] FROM [#gpa] GROUP BY [term]) AS t ON [g].[term] <= [t].[term]
GROUP BY [t].[term], [g].[course]
) AS m
GROUP BY [m].[term]
ORDER BY [m].[term];
DROP TABLE [#gpa];
The execution plan is the same as the one from Jacob's solution, but without the CTE and (IMHO :-)) a more readable query.
Cool beans... I had initially thought that just working with the max value would function correctly, but I couldn't quite get it to work. Beautiful solution, but let's "prettify the code and extend it to work with multiple students, shall we?
CREATE TABLE #gpa (
[student] int,
[term] int,
[course] varchar(10),
[mark] decimal(5,4)
);
INSERT INTO #gpa (student, term, course, mark)
VALUES(1, 1, 'math', 3),
(1, 1, 'eng', 4),
(1, 1, 'hist', 4.5),
(1, 1, 'anth', 4.5),
(1, 2, 'comp', 3),
(1, 2, 'math', 4),
(1, 2, 'bio', 4.5),
(1, 2, 'chem', 4.5),
(1, 3, 'soc', 4.5),
(1, 3, 'math', 4.5),
(1, 3, 'bio', 4),
(1, 3, 'chem', 4),
(2, 1, 'math', 3),
(2, 1, 'eng', 4),
(2, 1, 'hist', 4.5),
(2, 1, 'anth', 4.5),
(2, 2, 'comp', 3),
(2, 2, 'math', 4),
(2, 2, 'bio', 4.5),
(2, 2, 'chem', 4.5),
(2, 3, 'soc', 4.9),
(2, 3, 'math', 4.9),
(2, 3, 'bio', 4.9),
(2, 3, 'chem', 4.9);
SELECT *
FROM #gpa
ORDER BY student, term, course;
SET STATISTICS IO ON;
SELECT M.student, M.term,
SUM(M.MaxMark) /
CAST(COUNT(DISTINCT M.course) AS decimal(7,4)) AS GPA
FROM (
SELECT T.student, T.term, G.course,
MAX(G.mark) AS MaxMark
FROM #gpa AS G
INNER JOIN (
SELECT student, term
FROM #gpa
GROUP BY student, term
) AS T
ON G.student = T.student
AND G.term <= T.term
GROUP BY T.student, T.term, G.course
) AS M
GROUP BY M.student, M.term
ORDER BY M.student, M.term;
SET STATISTICS IO OFF;
DROP TABLE #gpa
Here's the results:
(24 row(s) affected)
student term course mark
----------- ----------- ---------- ---------------------------------------
1 1 anth 4.5000
1 1 eng 4.0000
1 1 hist 4.5000
1 1 math 3.0000
1 2 bio 4.5000
1 2 chem 4.5000
1 2 comp 3.0000
1 2 math 4.0000
1 3 bio 4.0000
1 3 chem 4.0000
1 3 math 4.5000
1 3 soc 4.5000
2 1 anth 4.5000
2 1 eng 4.0000
2 1 hist 4.5000
2 1 math 3.0000
2 2 bio 4.5000
2 2 chem 4.5000
2 2 comp 3.0000
2 2 math 4.0000
2 3 bio 4.9000
2 3 chem 4.9000
2 3 math 4.9000
2 3 soc 4.9000
(24 row(s) affected)
student term GPA
----------- ----------- ---------------------------------------
1 1 4.000000
1 2 4.142857
1 3 4.250000
2 1 4.000000
2 2 4.142857
2 3 4.450000
(6 row(s) affected)
Table '#gpa'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply