November 4, 2019 at 3:48 pm
How do I pull the employee with the lowest person_id along with their first name?
I have a "person table" (has 1000+ rows of data). Each person has a "person_id" associated with their name. I created a very simple query using the min function. When I run it, it pulls in the entire table (all the employees).
select
min(person_id) as lowest_id,
first_name
from
person
group by first_name
November 4, 2019 at 3:58 pm
You can use SELECT TOP 1 with and ORDER BY person_id clause.
John
November 4, 2019 at 4:08 pm
Thank you, but I'd like learn why min isnt working. Do you know why? I find i strange. I know there are other ways of doing it. But this is purely for a learning purpose for me as to why MIN is not working.
November 4, 2019 at 4:17 pm
The query you posted returns the lowest person_id for every first_name - so you'd get the John with the lowest person_id, the Wally with the lowest person_id, and so on. If you want the first name of the lowest person_id in the whole table, there are several ways of doing it - TOP, FIRST_VALUE, CTE with ROW_NUMBER - or you could get the MIN person_id and join back to the table to get the first_name that matched that person_id.
John
November 4, 2019 at 4:18 pm
If you had the following table:
person_id|first_name
---------|----------
1 |John
2 |Wally
3 |James
4 |John
The query in your post would return:
person_id|first_name
---------|----------
1 |John
2 |Wally
3 |James
Which removed the other "John" with a person_id = 4.
I think to accomplish what you are trying to achieve, you could use the following query:
SELECT TOP (1) person_id,
first_name
FROM person
ORDER BY person_id;
That query only grabs the TOP (1) person_id and first_name from the table persons and orders by the person_id ascending.
November 4, 2019 at 4:43 pm
I do not know if my method is faster or slower, but I would do it with a subquery:
select m.lowest_id
, p.first_name
from dbo.person p join
(select lowest_id=min(person_id)
from dbo.person) m on p.person_id=m.lowest_id
November 4, 2019 at 6:13 pm
I do not know if my method is faster or slower, but I would do it with a subquery:
Have you looked at the execution plan for this? Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.
November 5, 2019 at 6:48 am
srienstr wrote:I do not know if my method is faster or slower, but I would do it with a subquery:
Have you looked at the execution plan for this? Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.
How intelligent is SQL Server if it is a really big table?
If it is millions of rows, does it still sort all entries to get the first entry?
Sorry if it is a dumb question. Because i would have gone with the approach above and didnt even think of using the top 1 logic to get min/max values before because i was always assuming it would require a full sort of a table , which could turn out being quiet bad depending on the table size. Never would have thought of doing the top 1 approach because in the basics of execution plan readings its always mentioned that stuff like sorting is a bottleneck/slowdown.
I want to be the very best
Like no one ever was
November 5, 2019 at 11:42 pm
The performance mainly depends on the indexes available for the table and how up to date statistics are.
If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method. This is the worst case for the subquery, because it would then hit the table a second time to get the name.
If there is an index that has person_id as the first column, then no sort needed. The engine can seek directly to the lowest value in the already sorted index. The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.
I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times. The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.
A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:
https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/
November 6, 2019 at 6:31 am
The performance mainly depends on the indexes available for the table and how up to date statistics are.
If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method. This is the worst case for the subquery, because it would then hit the table a second time to get the name.
If there is an index that has person_id as the first column, then no sort needed. The engine can seek directly to the lowest value in the already sorted index. The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.
I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times. The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.
A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:
https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/
Uff, thats a bit for a bummer for me right now knowing that i made this mistake since forever.
I am intrested in more of this kind of "how to write queries more efficient" in general.
I am going to take a look into Bents Post and i am currently reading "SQL Server 2017 Query Performance Tuning" 5th edition by Grant Fritchey.
Can you recommend maybe more sources about this topic?
I have been working with SQL for only a year now and that 99% on my very own without any supervision. I am pretty sure i already got alot of bad SQL habits i dont want to get rid of, but for that i need to know it is a bad habbit of creating a query.
I want to be the very best
Like no one ever was
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply