Selecting one row for each value in another column

Question

Post reply

Selecting one row for each value in another column

webrunner

SSC-Dedicated

Points: 31863
More actions
March 28, 2007 at 3:36 pm

#116770

Hello,
Could someone help me with a query in the following situation? I'm using SQL 2000 in this case.
I have to return one row for a course, but some -- but not all -- courses have multiple instructors in this setup and so a query joining the tables will return two rows for one course if the course has two instructors.
Is there a way to return only one row for each course, whether or not it has multiple professors? I realize that the choice of which professor's row gets chosen will be arbitrary in this case, but that is OK for our purposes. I would just prefer to do this with a query rather than using an intermediate table.
I've tried to use TOP 1, but that returns only the first row of the results or, say, 1 percent of rows, not one row for each value in the course id column. DISTINCT also did not work, because I need to include the instructor name in the output (so DISTINCT returns two rows in those cases).
Here's an example (using sample code as the real code is far too convoluted):
SELECT c.course_id, c.course_title, i.instructor_id, i.instructor_name
FROM courses c INNER JOIN roster r on c.course_id = r.course_id
INNER JOIN instructors i ON r.instructor_id = i.instructor_id
Sample output:
course_id, course_title, instructor_id, instructor_name
009,Intermediate Math,25,John Smith
001,Intro to Math,25,John Smith
001,Intro to Math,31,Jane Doe
007,Advanced Math,45,Isaac Newton
Desired output:
course_id, course_title, instructor_id, instructor_name
009,Intermediate Math,25,John Smith
001,Intro to Math,31,Jane Doe
007,Advanced Math,45,Isaac Newton
OR
course_id, course_title, instructor_id, instructor_name
009,Intermediate Math,25,John Smith
001,Intro to Math,25,John Smith
007,Advanced Math,45,Isaac Newton
If there is any more information needed, just let me know.
Thanks for any help,
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

David McFarland SSChampion Points: 11815 More actions · Answer 1

Untested, and I don't have your data, so not optimized in any way.

SELECT

c.Course_ID

,c.Course_Title

,i.Instructor_ID

,i.Instructor_Name

FROM

(

SELECT

Course_ID = r.course_id

,Instructor_ID = Min(r.instructor_id)

FROM

roster AS r

GROUP BY

r.course_id

) AS x

INNER JOIN courses AS c ON

x.course_id = c.course_id

INNER JOIN instructors AS i ON

x.instructor_id = i.instructor_id

Koji Matsumura SSCrazy Points: 2241 More actions · Answer 2

Hello,

If you prefre not to use an intermediate table, try this:

SELECT c.course_id, c.course_title, i.instructor_id, i.instructor_name

FROM courses c

INNER JOIN roster r on c.course_id = r.course_id

and r.instructor_id = (SELECT MAX(Z.instructor_id) FROM roster Z WHERE Z.course_id = c.course_id)

INNER JOIN instructors i ON r.instructor_id = i.instructor_id

K. Matsumura

David McFarland SSChampion Points: 11815 More actions · Answer 3

Koji, I'm confused when you say "If you prefer not to use an intermediate table, try this", when we both use the "roster" table, which appears to me to be the intermediate table, from what evidence we have.

Sergiy SSC Guru Points: 110208 More actions · Answer 4

Actually approach from Koji is exactly what they name "hidden sursor".

On big amount of data it will kill performance.

_____________
Code for TallyGenerator

webrunner SSC-Dedicated Points: 31863 More actions · Answer 5

Sorry, I was unclear in my terms. I meant "temporary" table (not "intermediate" as in linking table for many-to-many relationship).

Thanks to everyone for the suggestions -- I'll test them out.

Another idea is that I am storing the results in a holding table, which I guess in essence can serve as a temporary table of sorts. For example, I can load the holding table, duplicate course rows and all, and add a counter to the rows. Then I can delete all but the minimum counter row for all courses that have more than one record in the table.

Does that sound reasonable?

Thanks again!

---

webrunner

-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

David McFarland SSChampion Points: 11815 More actions · Answer 6

webrunner, my method should work fine. Derived tables (which is what you are seeing in my query) are not only not typically problematic, but in many cases are the ideal solution. Run it, and if you have problems let us know.

Steve Jones - SSC Editor SSC Guru Points: 736222 More actions · Answer 7

Both Kofi's and David's suggestions should perform well if you have the tables indexes for those columns. The query optimizer can take some shortcuts and get you quick results.

Chris Campbell-415954 SSCrazy Points: 2248 More actions · Answer 8

When I saw this problem, I immediatly thought of two solutions. One was the method using the subquery to return the Min (or Max) value from the Instructor table and joining it back to the Course table. The second was to simply use the Min (or Max) function in the select list and employ a Group By clause on the other fields in the Select list. I was wondering what the differences in the execution plan might be so I tried and example using the AdventureWorks database (trivial, I know...I didn't include subsequent joins to the Contact information).

Here's my first Query:

SELECT

Stor.CustomerID, Stor.Name, Stor.SalesPersonID, Cont.ContactID

FROM Sales.Store Stor INNER JOIN Sales.StoreContact Cont

ON Stor.CustomerID = Cont.CustomerID

AND Cont.ContactID = (SELECT MIN(Cont2.ContactID)

FROM Sales.StoreContact Cont2

WHERE Cont2.CustomerID = Stor.CustomerID)

And Here's my second:

SELECT Stor.CustomerID, Stor.Name, Stor.SalesPersonID, MIN(Cont.ContactID)

FROM Sales.Store Stor INNER JOIN Sales.StoreContact Cont

ON Stor.CustomerID = Cont.CustomerID

GROUP BY Stor.CustomerID, Stor.Name, Stor.SalesPersonID

Both queryies generate pretty comparable execution plans with one exception, The group by clause causes an aggregation to be performed but the reletive cost is minimal. My question to all of you who are more versed in how queries are executed than me is: What are the drawbacks of the Group By approach in this case? To my eye, it is the simpler approach but I can see why the aggregation might impact performance.

Feel free to pile on now.

mengus Ten Centuries Points: 1187 More actions · Answer 9

For the Enthusiasts:

-- Just Updated this post 2:45PM 3/39/2007

My Hypothesis: The first thing that came to my mind, from past experience, is that a derived table in the case of Dave's query, and the group by clause from Chris's query, would essentially be more efficient than that of Koji's when large number of rows exists in these tables.

Here are my overall analysis and conducted controlled testing.

Considering that the appropriate indexes are on the dbo.Course, dbo.Roster, and dbo.Instructor

For my purposes, I did not use these exact tables, but I have followed a similar schema, with the appropriate indexes on Course_ID on both dbo.Course and dbo.Roster tables, also appropriate indexes on Instructor_ID on both dbo.Roster and dbo.Instructor tables, and finally index on Roster_ID on dbo.Roster. My plan, is to examine the execution plan of all 3 queries.

For my analysis, here are my queries:

-- This is similar to Koji's query

select v.Course_ID, a.Instructor_ID, c.Roster_ID

From dbo.Course v

INNER JOIN dbo.Instructor a

on a.Instructor_ID = v.Instructor_ID

INNER Join dbo.Roster c

on a.Instructor_ID = c.Roster_ID

where

c.Roster_ID = (SELECT Min(c2.Roster_ID)

From dbo.Roster c2

Where

c.Roster_ID = c2.Roster_ID

)

-- Similar to Chris's second query

select v.Course_ID,a.Instructor_ID, min(c.Roster_ID)

From dbo.Course v

INNER JOIN dbo.Instructor a

on a.Instructor_ID = v.Instructor_ID

INNER Join dbo.Roster c

on a.Instructor_ID = c.Roster_ID

group by v.Course_ID,a.Instructor_ID

-- Similar to Dave's

select v.Course_ID,ab.Instructor_ID, ab.Roster_ID

From dbo.Course v

INNER JOIN

(

Select a.Instructor_ID, min(c.Roster_ID) AS Roster_ID

from dbo.Instructor a

INNER Join dbo.Roster c

on a.Instructor_ID = c.Roster_ID

group by a.Instructor_ID

) AS ab

ON v.Instructor_ID = ab.Instructor_ID

Essentially I tested these 3 queries out, and the differences in performance were minimal. However, looking at the execution plans, Koji's method costs 51% of the batch (running all 3 queries together), while Dave and Chris's method consume ~25% each. Dave and Chris's methods generated the same execution plan, 2 merge joins, 3 index scans. Koji's method resulted in 3 merge joins and 4 index scans.

Conclusion, when dealing with large number of rows, please follow the derive table method (or Chris’s method, if you can group correctly), this will boost performance compare to using Where ID = (SELECT Min(ID)). Using the where clause will drag down performance when dealing with large number of rows. However, if the rows are minimal, the quickest solution would be to use Koji's method, but keep in mind, to always think/plan ahead for any future potential performance issues.

If there shall be any questions and/or comments, please feel free to contact me via Instant Messaging, my email is mengus_vang@hotmail.com .

Regards

Meng (SQL Server DBA/Developer/Enthusiasts)

MCTS

webrunner SSC-Dedicated Points: 31863 More actions · Answer 10

Thanks for all of the in-depth comments!

Due to time, I decided not to work on the query side as suggested above. Instead, I decided to put the values in the holding table and then delete the duplicates afterward.

Thanks again, though. As soon as I get a chance I will return to the query methods described above, taking into account Meng's performance analysis.

---

webrunner

-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html