December 11, 2002 at 7:05 pm
As a relative T-SQL newbie I'm always comparing what I know in MSAccess to what I need to do in T-SQL. I'm currently trying to write a query to do the following. Given this table:
Name TestNo Grade
joe 1 A
joe 2 B
joe 3 C
bob 2 B
bob 3 C
sue 1 B
sue 2 B
sue 3 C
Return the best row for each student:
Name TestNo Grade
joe 1 A
bob 2 B
sue 1 B
In access I'd do:
SELECT g.Name, First(g.TestNo) AS TestNo, Min(g.Grade) AS Grade
FROM Grades g
GROUP BY g.Name;
However there's no FIRST aggregate function in T-SQL that I'm aware of. How shall I construct a query to retrieve the same results?
Greg.
December 11, 2002 at 7:46 pm
As soon as I hit 'post' I realized that this example is too simple. I can simply substitute 'MIN' where 'FIRST' is used.
I also figured out that this is not a terribly efficient query against a large data set. Any suggestions on other ways to get the same results that might speed things up? (Hash Match/Aggregate is the biggest bottleneck at 46% according to the query plan)
December 11, 2002 at 8:29 pm
See my FAQ on "What (good) are self joins anyway?" to answer this q.
select *
from grades g
and g.Grade = (select min(grade)
from grades inside
where g.name = inside.name
)
Index the Grade and Name columns.
December 11, 2002 at 10:35 pm
Thanks for the tip don1941. I'll try it out tomorrow at work.
December 12, 2002 at 3:26 am
The problem with using MIN is that you could end up with the wrong test for the grade.
Name TestNo Grade
joe 3 A
joe 2 B
joe 1 C
With this data you would end up with joe 1 A.
Try
SELECT Grades.name, AggGrades.min_grade, MIN(testNo)
FROM Grades
JOIN (SELECT name, min(grade) min_grade
FROM grades
GROUP BY name) AggGrades ON AggGrades.Name = Grades.Name
AND AggGrades.min_grade = Grades.grade
GROUP BY Grades.name, AggGrades.min_grade
The derived table returns the min grade for a person and then the main query finds the minimum testNo for each person and their min grade.
Simon Sabin
Co-author of SQL Server 2000 XML Distilled
http://www.amazon.co.uk/exec/obidos/ASIN/1904347088
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons
December 12, 2002 at 11:05 am
Thanks to both of you! My actual scenario is more complex, with joined tables even on the inside loop. What I gather from both of your examples is that I'll have to make a nested self-join for each important field until I get one row per student.
December 13, 2002 at 2:31 am
Just to add to this I use to use subqueries and joins to aggregate values for one table to update another. But found performance dropped like a lead weight when volume increased. I change the process to summarise into temp table first and then did update with a join. Found huge improvement. Just an observation. I know there is probably a debate raging regarding subqueries vs temp tables.
Far away is close at hand in the images of elsewhere.
Anon.
December 13, 2002 at 5:00 am
Try this
SELECT TG.* FROM
(SELECT [Name], Min(TestNo) As TestNo FROM TestGrade Group By [Name]) As FT
INNER JOIN
TestGrade TG
ON
FT.[Name] = TG.[Name] AND
FT.TestNo = TG.TestNo
Order By TG.Grade, TG.[Name]
December 13, 2002 at 9:25 am
Ok, to make things more complicated, the following code creates the 'grades' table. I've modified the original data and added a comment column. What I'd like to get back is:
name, first test that got highest grade, highest grade, and comment for each student.
Q1: The query I used works, but is this an efficient way?
Q2: As I need this result set as the basis of MANY other queries, does it make sense to dump the results into a table? The data is imported into SQL server once weekly, I could simply add the code to re-generate the data at the same time.
A bit more info: I'll probably try joining to temp tables too as David Burrows suggests. The real source data comes from two tables each with around 100k records, query time is about 15 sec on my development server.
USE PUBS
GO
SET NOCOUNT ON
IF EXISTS (SELECT * from sysobjects WHERE id = object_id(N'Grades')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
Drop Table Grades
/*This builds the table:*/
Create table Grades (
[Name] varchar(10),
TestNo int,
Grade varchar(1),
Comment VarChar(50)
)
GO
/*Insert data:*/
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('joe', '1', 'C', 'try harder')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('joe', '2', 'B', 'Ok')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('joe', '3', 'A', 'Good')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('bob', '2', 'B', 'Not bad')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('bob', '3', 'C', 'need work')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('sue', '1', 'B', 'Ok')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('sue', '2', 'B', 'Almost an A')
INSERT into Grades ([Name], TestNo, Grade, Comment)
Values('sue', '3', 'C', 'What happened')
/*The Query:*/
/*Get First record with Best grade for each student.*/
SELECT * From Grades
INNER JOIN
--get first test with best grade
(
SELECT bg.[name], bg.grade, min(g2.testno) AS TestNo
FROM Grades g2
INNER JOIN
--get Best Grade for each student
(
SELECT g.[name], MIN(g.grade) AS Grade
FROM Grades g
GROUP BY g.[name]
) bg
ON g2.[name] = bg.[name] AND g2.grade = bg.grade
GROUP BY bg.[name], bg.grade
)ft
ON Grades.[name] = ft.[name] AND Grades.[TestNo] = ft.TestNo AND Grades.grade = ft.grade
/*cleanup*/
IF EXISTS (SELECT * from sysobjects WHERE id = object_id(N'Grades')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
Drop Table Grades
Thanks Again.
Greg.
Edited by - GregLyon on 12/13/2002 10:26:51 AM
December 16, 2002 at 4:38 am
Self joins and subselects can be avoided by using the following query:
SELECT Name, RIGHT(MIN(Grade + CAST(TestNo AS VARCHAR)),1), MIN(Grade), SUBSTRING(MIN(Grade + CAST(TestNo AS VARCHAR) + Comment),3,50)
FROM Grades
GROUP BY Name
The trick is to prefix the extra columns with the columns for which the aggregate is computed, apply the same aggregate, and then strip off the data on the right. This technique can be used with any GROUP BY query for which additional columns have to be displayed!
Jorg Jansen
Jorg Jansen
Manager Database Development
Infostradasports.com
Nieuwegein
The Netherlands
Jorg Jansen
Manager Database Development
Infostradasports.com
Nieuwegein
The Netherlands
December 16, 2002 at 8:27 am
Thanks, Jorg, for that tip. The sql looks more maintainable than the sub-join method to me. How do you find it does performance wise compared to the self-join method? I'll be testing it on my real database to compare, of course.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply