aggregate sql question

Question

aggregate sql question

Neal B. Scott

SSC Enthusiast

Points: 186
More actions
February 28, 2005 at 5:40 am

#166044

Given this table:
CREATE TABLE #T_PEOPLE (
SSN char (9),
BIRTHDAY datetime
)
insert #T_PEOPLE(ssn, birthday) values('123456789', '1/1/60')
insert #T_PEOPLE(ssn, birthday) values('111223333', '1/1/50')

Which query is faster to find the ssn of the older person?
This query:
select top 1 SSN
from #T_PEOPLE
order by birthday

or this one:
select SSN
from #T_PEOPLE
where birthday=
(select min(birthday) from #T_PEOPLE)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

AJ Ahrens SSC-Insane Points: 20676 More actions · Answer 1

According to the Query Plan (From toolbar in QA select Query.. Show Execution Plan) Query 1 is more efficient (doesn't perform a TOP).

However, the TOP cost is 0% it is just an additional step.

I tried the following query as well (Like the structure better)

SELECT T1.SSN

FROM #T_PEOPLE T1

INNER JOIN (SELECT MIN(birthday) birthday FROM #T_PEOPLE) T2

ON T1.birthday = T2.birthday

It gave same plan as #2.

I guess you just need to test each query with a full set of data and make sure that you properly INDEX the table....

Good Hunting!

AJ Ahrens

webmaster@kritter.net

RD Francis SSCarpal Tunnel Points: 4867 More actions · Answer 2

I expected little difference between these two queries, performance-wise. But, being the curious soul I am, I decided to try it out, with the following code:

DROP TABLE #T_PEOPLE
GO

CREATE TABLE #T_PEOPLE (
SSN char (9),
BIRTHDAY datetime
)

SET NOCOUNT ON
DECLARE @counter int
SET @counter = 100000000
WHILE @counter < 101000000
BEGIN
INSERT INTO #T_PEOPLE (SSN, BIRTHDAY)
  VALUES (@counter, DATEADD(dd, CEILING(36525 * RAND()), '1905-01-01'))
IF (@counter % 1000 = 0)
BEGIN
PRINT CONVERT(varchar,@counter) + ' - ' + CONVERT(varchar,getdate())
END
SET @counter = @counter + 1
END

SET NOCOUNT OFF
GO

Then I ran the two queries through as a single batch, with Show Execution Plan turned on.

The "order by" query accounted for 96% of execution; the "select min" plan only 4%.

Out of curiosity, I added an index (right before SET NOCOUNT OFF above:

CREATE INDEX [people_birthday] ON #T_PEOPLE (BIRTHDAY) WITH  FILLFACTOR = 90

This time, "order by" only took 66% of the execution, and "select min" 34%.

The "select min" approach would seem to be a clear winner.

And, yes, I have no life.

R David Francis

RD Francis SSCarpal Tunnel Points: 4867 More actions · Answer 3

Forgot to add this when I recreated my post (first attempt disappeared when I hit "Preview", probably because of some sort of time-out thing).

You need "TOP 1" in the "select min" statement, in case there is more than one record with that birthdate; In my 1,000,000 row test table, there were several.

R David Francis

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 4

The percentages within a plan is only valid in the context of that particular plan. You can't compare 'order by 75%' from one plan to 'min 25%' from another plan and draw any conclusions from that. What the figures say is just that in the first plan, the 'order by' stood for 75% of that entire plan, and the 'min' stood for 25% of the other plan's total.

They can't be compared to each other as percentages because they have no correlation between plans. To give some hint about efficiency, you have to drill into the particular statements in each plan and compare the metrics for each, (ie pages, i/o, rows etc)

/Kenneth

RD Francis SSCarpal Tunnel Points: 4867 More actions · Answer 5

I ran the two statements together as a batch. Therefore, the plan included both statements. When I say that the "order by" statement had a percentage of 96%, I mean that all the elements of the statement, within the overall plan, added up to 96%. Similarly, all the elements of the "select min" statement added up to 4% of the same plan. Otherwise, it would be a barely credible coincidence that, taken together, they added up to 100%.

I do agree that there are caveats involved. In particular, comparing the percentages in the first plan (both statements, no index) to those in the second plan (both statements, birthday indexed) is not directly applicable. The first statement took about 2 seconds to execute; the second was virtually instantaneous. Some might take the change in percentage on the "select min" statement to mean that it actually runs less effectively with an index in place. Since we don't have a solid benchmark for comparison, we can't say for certain that it's faster, the same, or slower with the index. We can say that the two statements combined are noticeably faster with the index in place. We can say that it is *likely* that the "select min" statement is not dramatically faster, and thus that the higher percentage in the indexed batch represents the dramatic improvement in the "order by" statement.

R David Francis