Query plan confusion

  • Can somebody explain why:-

    select run_id from T_BIR_RUN where run_id = (select max(run_id) from T_BIR_RUN)

    (see select with max.sqlplan attachment)

    Looks more efficent then:-

    select max(run_id) from T_BIR_RUN

    (see max.sqlplan attachment)

    I would have thought the first one would be less efficent as it's performing the second query plus a bit more. Been trying to explain query plans to developers :w00t: and they landed me with this one.

    Thanks.

  • Have you run them both in the same batch to see the cost relative to the batch?

    When I run a similar query they have the same "cost". I'd also look at the IO Stats, I bet that they are the same. Slightly different path, but same performance.

  • In the same batch they both show 50% each.

    I/O Stats wise please see new attachment.

  • Those are Client Stats. What Jack's asking for is the output when STATISTICS IO is on.

    SET STATISTICS IO ON

    GO

    <run query>

    Also note that costs are estimates, not necessarily accurate values

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • They actually both do the same thing. All of the work is the clustered index scan in both cases.

    One "looks" more efficient, but only because it's showing the aggregate function, while the other isn't because it's not in the final query. It's an apparent difference, not a real one.

    It's a good example for teaching plan-reading, because you definitely have to learn how to read past the "noise" and see the signal, and the query plans will often contain significant noise.

    Try running "set statistics time, io on" on both queries, and see if there's any difference there. That's almost always the next step on query tuning after looking at the execution plan. (Actually, I often go with the stats first, then the plan. Helps me filter noise vs signal.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ah thanks - they are identical i'm afraid 🙁

    (1 row(s) affected)- MAX only query

    Table 'T_BIR_RUN'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    (1 row(s) affected)- Select with MAX query

    Table 'T_BIR_RUN'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    I just can't understand why the MAX only query has the Stream Aggregate step and the other one doesn't.

  • Starting to make a bit more sense but even with "set statistics time, io on" the output is identical.

    Unfortunately having to explain to developers that I cannot explain the difference and they should just trust me is quite tricky.

    Currently being teased with 'you don't know' jibes :w00t:

    Also, the plans are the actual ones and not estimated.

  • jasonmorris (4/19/2011)


    Starting to make a bit more sense but even with "set statistics time, io on" the output is identical.

    I would expect that they would be. As Gus said, "All of the work is the clustered index scan in both cases."

    If SQL is reading the table once in each case, the IOs will be identical. The optimiser is smart enough to realise in the first case that it does not need to scan the entire table twice, that the query simplifies down to a backward clustered index scan that aborts after the first page. In your case with the subquery, the aggregate is just wrapped into the cluster scan.

    Also, the plans are the actual ones and not estimated.

    If that's in reply to my comment that the costs are estimates, it makes no difference. Costs are estimates. Full Stop.

    The only difference between an estimated and an actual plan is that the actual contains run-time information. Actual row counts, actual executions, etc. It's no more accurate a plan than the estimated.

    As for the developers teasing you, tell them to shut up. 🙂 No one knows everything. Grant Fritchey wrote an entire book on execution plans and I'm willing to bet there are plans he can't fully explain. Unless one of them is formerly of the query optimisation team at Microsoft, they really have no grounds for that.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks all for the excellent replies.

    Developers have been locked back in their cupboard for the rest of the year.

  • jasonmorris (4/20/2011)


    Developers have been locked back in their cupboard for the rest of the year.

    😀 :hehe:

    Just make sure you slip some coffee to them every now and again, otherwise the production of code may decline. 🙂

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (4/20/2011)


    jasonmorris (4/20/2011)


    Developers have been locked back in their cupboard for the rest of the year.

    😀 :hehe:

    Just make sure you slip some coffee to them every now and again, otherwise the production of code may decline. 🙂

    If you saw some of the standard, you'd understand why that wouldnt make any difference

    NB: I work with Jason, I'm not just randomly sticking 2p in

    Adam Zacks-------------------------------------------Be Nice, Or Leave

  • One other possible reason for the difference in plans - the two queries are not equivalent

    select max(run_id) from T_BIR_RUN

    always returns a single row.

    select run_id from T_BIR_RUN where run_id = (select max(run_id) from T_BIR_RUN)

    will return multiple rows in the event of ties. So the one returning a single row has a stream aggregate to get a single row. I'm guessing in practice run_id is unique and it will never make a difference though.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply