January 17, 2008 at 11:19 am
I am trying to learn the basics of subqueries. Here i want to return all columns from MyTable for aspecific value of Specialty for distinct name IDs.
NameID is not unique in MyTable. So WHERE Specialty = 'X1' will return many rows with the same NameID. I need one instance of each NameID where Specialty = 'X1'. What should the syntax be?
SELECT *
FROM MyTable
WHERE IN (SELECT DISTINCT NameID FROM MyTable WHERE Specialty = 'X1')
January 17, 2008 at 11:57 am
The problem with this syntax is you can return a name id without Specialty = 'X1' because the outerquery is only looking for NameIDs. If a NameID has two entries one with Specialty = 'X1' , while the other does not, both are still returned.
You should have an additional filter in the where clause
Subquery Solution:
SELECT *
FROM MyTable
WHERE NameID IN (SELECT DISTINCT NameID FROM MyTable WHERE Specialty = 'X1')
AND Specialty = 'X1'
You should be able to query like this though.
SELECT DISTINCT NameID, Specialty, Col3, Col4 etc
FROM MyTable
WHERE Specialty = 'X1'
January 17, 2008 at 12:07 pm
It would help us help you if you provided the DDL for your tables, some sample data, and what your epected output would look like based on the sample data.
Take some time to read this article: http://www.sqlservercentral.com/articles/Best+Practices/61537/
😎
January 17, 2008 at 12:12 pm
Select Distinct on dbo.MyTable should give you what you want.
Using "Select Distinct" in a subquery for an "in" clause doesn't actually do anything except make more work for the server.
Think of it this way:
Select *
from dbo.MyTable
where name in ('Joe', 'Joe', 'Joe', 'Bob')
vs
Select *
from dbo.MyTable
where name in ('Joe', 'Bob')
Both will give exactly the same result, you'll get all the rows from dbo.MyTable where the name is either "Joe" or "Bob".
Adding "select distinct" to the subquery just tells SQL to get rid of the extra "Joe"s from the list. It doesn't affect the end result, and it actually slows the query down, because removing the duplicates takes more effort than the original match does.
So, if you want distinct results in your end result, put them in the outer query, not the subquery.
Just think of subqueries in your Where statement (the sample you are using), as a list of "these are all the values I want", just like a regular "In" statement. Don't use distinct, or group by, or anything else, just tell the server the list of values that are okay. It will work out how to use that list.
(I'm going into more detail on this because you said you were new to subqueries, so I'm explaining a bit more about them than is necessary for the exact example you gave. I hope that helps.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 17, 2008 at 12:14 pm
I am new to MS SQL and am trying to learn how things should be done. Given there are many ways to do the samne thing.
The solution you suggest is exactly what I needed and your explanation has helped me understand both subquery and DISTINCT.
Thank you very much!
January 17, 2008 at 12:37 pm
GSquared
I understand your explanation. Makes perfect sense. The problem I have had when working with DISTINCT is that all columns in the select list are engaged in the distinct. I have had requests, for example, for distinct NameID and Specialty but include all additional columns in the table. If the data were…
NameID…….Specialty……….ColX
67……………..X1………………A
67……………..X1………………B
These two rows would be unique and appear in the result set. So I could not use a DISTINCT. That is why I went with a subquery and * for the outer select field list. Is there a better way to get all columns in the output but “distinct” on only a few columns?
January 17, 2008 at 12:54 pm
This kind of CTE would do the trick:
With myCTE (nameID, name, address, etc,RN) AS
(
Select MT.nameid,
MT.name,
MT.address,
MT.etc,
MT.ROW_NUMBER() OVER (partition by MT.nameID)
from dbo.MyTable MT
)
select myCTE.nameID,myCTE.name,myCTE.address,myCTE.etc
from MyCTE
WHERE myCTE.RN=1
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 17, 2008 at 1:09 pm
kirk (1/17/2008)
GSquaredI understand your explanation. Makes perfect sense. The problem I have had when working with DISTINCT is that all columns in the select list are engaged in the distinct. I have had requests, for example, for distinct NameID and Specialty but include all additional columns in the table. If the data were…
NameID…….Specialty……….ColX
67……………..X1………………A
67……………..X1………………B
These two rows would be unique and appear in the result set. So I could not use a DISTINCT. That is why I went with a subquery and * for the outer select field list. Is there a better way to get all columns in the output but “distinct” on only a few columns?
Which one of the 2 rows would you want displayed then?
January 17, 2008 at 2:42 pm
JeffB –
The requester did not care which row was returned. All of ColX was populated. He just wanted a value from ColX.
Matt –
You have blown past any thing I know. I will be looking up what CTE is, and then figure out what your doing here. Thanks for the homework. LOL
January 17, 2008 at 2:44 pm
Sorry - I've been getting acquainted with them as well.
Look up Common Table Expressions in BOL to get a primer on how they work. Not bad for some things, but not great for others too....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 17, 2008 at 2:53 pm
kirk (1/17/2008)
GSquaredI understand your explanation. Makes perfect sense. The problem I have had when working with DISTINCT is that all columns in the select list are engaged in the distinct. I have had requests, for example, for distinct NameID and Specialty but include all additional columns in the table. If the data were…
NameID…….Specialty……….ColX
67……………..X1………………A
67……………..X1………………B
These two rows would be unique and appear in the result set. So I could not use a DISTINCT. That is why I went with a subquery and * for the outer select field list. Is there a better way to get all columns in the output but “distinct” on only a few columns?
If they don't care which one of the ColX columns is displayed, you might try:
select NameID, Specialty, min(ColX) as ColX
from dbo.MyTable
where Specialty = 'X1'
group by NameID, Specialty
That will get you:
NameID Specialty ColX
67 X1 A
Will that do what you need? If so, then just use min() on each column other than the ones you want to group by (NameID and Specialty in this case).
Another solution would be:
select top 1 NameID, Specialty, ColX
from dbo.MyTable
where specialty = 'X1'
That will only return the top row, but it might be easier to build. The first one, you can modify the Where clause to allow for things like, "Where Specialty in ('X1', 'X2')", which would get you one row for each NameID and each specialty.
The second one will only return the first NameID also.
Do you have a unique key (primary key) on the table you are selecting from? If so, which column(s) is it on? Or a unique ID column? In either case, a join on a subquery (I can explain what that means) can be done which will get you the best possible results.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 18, 2008 at 7:28 am
There is a primary key in the table, MyTableID. Your description if the ways to do this is very good. I am learing a lot. How would the primary key be used to provide asolution?
January 18, 2008 at 8:58 am
select NameID, Specialty, ColX
from dbo.MyTable
inner join
(select min(MyTableID) as MinID
from dbo.MyTable
where Specialty = 'X1'
group by NameID) Sub1
on MyTable.MyTableID = Sub1.MinID
That will give you the first instance of each NameID where the Specialty is X1.
The sub-query gets a list of the first MyTableID for each NameID for that specialty. The outer query then gives you the extra columns you want. If you want to add more to the Where clause of the sub-query, you can do so. It's a pretty flexible approach to this kind of thing.
Want the last entry instead of the first: change to max(MyTableID).
Want the specialty to be in a range: change Where to "Where Specialty in (...)"
And so on.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 18, 2008 at 9:07 am
Got it 🙂 You have really added to my knowledge base and tool bag.
Thank you very much!
January 18, 2008 at 9:14 am
Kirk -
I'm adding to your homework:).
The same idea can be achieve with the CTE (to ensure you always get the same record back within a group), as so (note that the only difference from before is the extra ORDER BY):
With myCTE (nameID, name, address, etc,RN) AS
(
Select MT.nameid,
MT.name,
MT.address,
MT.etc,
MT.ROW_NUMBER() OVER (partition by MT.nameID ORDER BY MyTableID) --<--right here
from dbo.MyTable MT
)
select myCTE.nameID,myCTE.name,myCTE.address,myCTE.etc
from MyCTE
WHERE myCTE.RN=1
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply