Query plan confusion

Question

Query plan confusion

jasonmorris

SSCrazy

Points: 2492
More actions
April 19, 2011 at 7:12 am

#236895

Can somebody explain why:-
select run_id from T_BIR_RUN where run_id = (select max(run_id) from T_BIR_RUN)
(see select with max.sqlplan attachment)
Looks more efficent then:-
select max(run_id) from T_BIR_RUN
(see max.sqlplan attachment)
I would have thought the first one would be less efficent as it's performing the second query plus a bit more. Been trying to explain query plans to developers :w00t: and they landed me with this one.
Thanks.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184397 More actions · Answer 1

Have you run them both in the same batch to see the cost relative to the batch?

When I run a similar query they have the same "cost". I'd also look at the IO Stats, I bet that they are the same. Slightly different path, but same performance.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

jasonmorris SSCrazy Points: 2492 More actions · Answer 2

In the same batch they both show 50% each.

I/O Stats wise please see new attachment.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Those are Client Stats. What Jack's asking for is the output when STATISTICS IO is on.

SET STATISTICS IO ON

GO

<run query>

Also note that costs are estimates, not necessarily accurate values

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 4

They actually both do the same thing. All of the work is the clustered index scan in both cases.

One "looks" more efficient, but only because it's showing the aggregate function, while the other isn't because it's not in the final query. It's an apparent difference, not a real one.

It's a good example for teaching plan-reading, because you definitely have to learn how to read past the "noise" and see the signal, and the query plans will often contain significant noise.

Try running "set statistics time, io on" on both queries, and see if there's any difference there. That's almost always the next step on query tuning after looking at the execution plan. (Actually, I often go with the stats first, then the plan. Helps me filter noise vs signal.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

jasonmorris SSCrazy Points: 2492 More actions · Answer 5

Ah thanks - they are identical i'm afraid 🙁

(1 row(s) affected)- MAX only query

Table 'T_BIR_RUN'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(1 row(s) affected)- Select with MAX query

Table 'T_BIR_RUN'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

I just can't understand why the MAX only query has the Stream Aggregate step and the other one doesn't.

jasonmorris SSCrazy Points: 2492 More actions · Answer 6

Starting to make a bit more sense but even with "set statistics time, io on" the output is identical.

Unfortunately having to explain to developers that I cannot explain the difference and they should just trust me is quite tricky.

Currently being teased with 'you don't know' jibes :w00t:

Also, the plans are the actual ones and not estimated.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

jasonmorris (4/19/2011)
Starting to make a bit more sense but even with "set statistics time, io on" the output is identical.

I would expect that they would be. As Gus said, "All of the work is the clustered index scan in both cases."

If SQL is reading the table once in each case, the IOs will be identical. The optimiser is smart enough to realise in the first case that it does not need to scan the entire table twice, that the query simplifies down to a backward clustered index scan that aborts after the first page. In your case with the subquery, the aggregate is just wrapped into the cluster scan.

Also, the plans are the actual ones and not estimated.

If that's in reply to my comment that the costs are estimates, it makes no difference. Costs are estimates. Full Stop.

The only difference between an estimated and an actual plan is that the actual contains run-time information. Actual row counts, actual executions, etc. It's no more accurate a plan than the estimated.

As for the developers teasing you, tell them to shut up. 🙂 No one knows everything. Grant Fritchey wrote an entire book on execution plans and I'm willing to bet there are plans he can't fully explain. Unless one of them is formerly of the query optimisation team at Microsoft, they really have no grounds for that.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

jasonmorris SSCrazy Points: 2492 More actions · Answer 8

Thanks all for the excellent replies.

Developers have been locked back in their cupboard for the rest of the year.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 9

jasonmorris (4/20/2011)
Developers have been locked back in their cupboard for the rest of the year.

😀 :hehe:

Just make sure you slip some coffee to them every now and again, otherwise the production of code may decline. 🙂

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Schadenfreude-Mei SSChampion Points: 14083 More actions · Answer 10

GilaMonster (4/20/2011)
jasonmorris (4/20/2011)
Developers have been locked back in their cupboard for the rest of the year.
😀 :hehe:
Just make sure you slip some coffee to them every now and again, otherwise the production of code may decline. 🙂

If you saw some of the standard, you'd understand why that wouldnt make any difference

NB: I work with Jason, I'm not just randomly sticking 2p in

Adam Zacks-------------------------------------------Be Nice, Or Leave

paul.jones SSC Eights! Points: 854 More actions · Answer 11

One other possible reason for the difference in plans - the two queries are not equivalent

select max(run_id) from T_BIR_RUN

always returns a single row.

select run_id from T_BIR_RUN where run_id = (select max(run_id) from T_BIR_RUN)

will return multiple rows in the event of ties. So the one returning a single row has a stream aggregate to get a single row. I'm guessing in practice run_id is unique and it will never make a difference though.