Slow performant select query, yet apparently simple.

Question

Slow performant select query, yet apparently simple.

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

agustingarzon (3/1/2010)
I double posted by mistake, sorry.
Here are all the plans in the attached zip.

I am having the same problem as Paul on the sqlplan attachments. The zip worked though. Thanks,...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 2

Please...try this:

WITH Numbered

AS (

SELECT CF.ID,

rn = ROW_NUMBER() OVER (ORDER BY CF.ID ASC)

FROM dbo.ContactForms CF

),

Keys

AS (

SELECT TOP (20)

N.ID

FROM Numbered N

WHERE N.rn >= 30000

ORDER BY

N.rn

)

SELECT data.*

FROM Keys K

INNER

LOOP

JOIN dbo.ContactForms data

ON data.ID = K.ID

OPTION (FORCE ORDER);

Paul

edit: code clarity/layout

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 3

The text file in the ZIP is empty 🙁

The problem in your execution plan was the merge join. A poor choice by the optimizer.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 4

Try it like this:

WITH

cteNumber AS

(

SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS IXX

FROM dbo.ContactForms

)

ctetop20 AS

(

SELECT TOP 20 num.*

FROM cteNumber num

WHERE num.IXX > 15000

)

SELECT data.*

FROM dbo.ContactForms data

INNER JOIN ctetop20 num

ON data.ID = num.ID

;

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

Paul White (3/1/2010)
The text file in the ZIP is empty 🙁
The problem in your execution plan was the merge join. A poor choice by the optimizer.
Paul

Hmm, I thought that it was that it was doing the larger index first, when it should do the smaller first. But I guess that doesn't mean as much in a merge join, so ... hmm ...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 6

Still a risk of the optimizer choosing a merge join with that one, Barry.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 7

RBarryYoung (3/1/2010)
Hmm, I thought that it was that it was doing the larger index first, when it should do the smaller first. But I guess that doesn't mean as much in a merge join, so ... hmm ...

It means nothing with a one-to-one merge join, given pre-sorted inputs, as we have here.

The optimizer is seriously under-estimating the cost of scanning that clustered index.

Loop join is optimal with small inputs (not necessarily small tables).

The twenty rows coming from my non-clustered index needs to be the outer (driving) input.

Twenty seeks into the clustered index shouldn't take too long, even with huge LOBs off-row.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

Paul White (3/1/2010)
Still a risk of the optimizer choosing a merge join with that one, Barry.

True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 9

RBarryYoung (3/1/2010)
Paul White (3/1/2010)
Still a risk of the optimizer choosing a merge join with that one, Barry.
True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.

Indeed. In general I agree. My concern is that this thread needs a 'quick win' that is guaranteed. We can tweak later. We already know the optimizer has good statistics on the columns concerned (indexes rebuilt), so this is just a case of it getting things wrong. Maybe is won't choose a merge join for your query. This time. 😀

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

Paul White (3/1/2010)
...
The optimizer is seriously under-estimating the cost of scanning that clustered index.
...

Yes, that's the big question here, why is it doing that? It's RowCount estimate is 66 for a Clustered Index scan with No Predicate on a table with 15,020 rows. Something is not right there. Sounds like bad statistics, but I don't see the little warning icon you usually get for that.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

Paul White (3/1/2010)
RBarryYoung (3/1/2010)
Paul White (3/1/2010)
Still a risk of the optimizer choosing a merge join with that one, Barry.
True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.
Indeed. In general I agree. My concern is that this thread needs a 'quick win' that is guaranteed. We can tweak later. ...
Paul

Oops, I must have missed that. Sorry.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 12

RBarryYoung (3/1/2010)
Oops, I must have missed that. Sorry.

That's just my view - I may be wrong about it. Never any need to apologise.

It's 7:30am and I need to go very shortly.

I look forward to catching up on this later.

I'll leave it in your capable hands.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 13

Last point: I have just looked at the what-about execution plan. Here (with 500 set as the limit, and no clustered index) the optimizer chooses a much better plan. I would expect it to be very much faster than the others. The clustered index was dropped for this one, so no merge join was possible (there's no covering n-c index for the final SELECT). Re-instating the clustered index and using my last posted query should be just as fast, if not slightly faster (fetching row data via the clustered index should be a bit quicker than the RID lookups). I doubt there would be very much in it for twenty rows though.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

agustingarzon SSC Eights! Points: 817 More actions · Answer 14

Dear Barry, the exec plan for your suggestion is the same as NewExec I just posted.

The approach is ok, but it's making a clust-ix-scan that scans through the first 15.000 records based on your example.

But if I add an ORDER BY num.IXX ASC as Paul suggested, the exec plan changes the large scan for a seek operation to get the exact 20 required rows.

I haven't run the test to see if it gets any faster, will do in a couple hours and post back.

Paul, if nvarchar-max columns are stored off row as you said and as explained in the documentation, why would the clustered index behave in such a way ? The 3 nvarchar(255) fields add up to 0.4 kbs in the largest row. I can't find a way to get the table "large value types out of row" value, I know of the sp_tableoption but it's only for setting the value, perhaps it's storing in row but that's not the default.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 15

agustingarzon (3/1/2010)
Dear Barry, the exec plan for your suggestion is the same as NewExec I just posted.
The approach is ok, but it's making a clust-ix-scan that scans through the first 15.000 records based on your example.
But if I add an ORDER BY num.IXX ASC as Paul suggested, the exec plan changes the large scan for a seek operation to get the exact 20 required rows.
I haven't run the test to see if it gets any faster, will do in a couple hours and post back.
Paul, if nvarchar-max columns are stored off row as you said and as explained in the documentation, why would the clustered index behave in such a way ? The 3 nvarchar(255) fields add up to 0.4 kbs in the largest row. I can't find a way to get the table "large value types out of row" value, I know of the sp_tableoption but it's only for setting the value, perhaps it's storing in row but that's not the default.

Just to clarify, are you saying that the ORDER BY fixes it?

Hmm, maybe that's all it is then? The ORDER BY is supposed to be required for TOP, but they don't enforce that syntactically. So maybe the optimizer just gets dumb sometimes, without it?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]