Slow performant select query, yet apparently simple.

  • agustingarzon (3/1/2010)


    I double posted by mistake, sorry.

    Here are all the plans in the attached zip.

    I am having the same problem as Paul on the sqlplan attachments. The zip worked though. Thanks,...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Please...try this:

    WITH Numbered

    AS (

    SELECT CF.ID,

    rn = ROW_NUMBER() OVER (ORDER BY CF.ID ASC)

    FROM dbo.ContactForms CF

    ),

    Keys

    AS (

    SELECT TOP (20)

    N.ID

    FROM Numbered N

    WHERE N.rn >= 30000

    ORDER BY

    N.rn

    )

    SELECT data.*

    FROM Keys K

    INNER

    LOOP

    JOIN dbo.ContactForms data

    ON data.ID = K.ID

    OPTION (FORCE ORDER);

    Paul

    edit: code clarity/layout

  • The text file in the ZIP is empty 🙁

    The problem in your execution plan was the merge join. A poor choice by the optimizer.

    Paul

  • Try it like this:

    WITH

    cteNumber AS

    (

    SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS IXX

    FROM dbo.ContactForms

    )

    ctetop20 AS

    (

    SELECT TOP 20 num.*

    FROM cteNumber num

    WHERE num.IXX > 15000

    )

    SELECT data.*

    FROM dbo.ContactForms data

    INNER JOIN ctetop20 num

    ON data.ID = num.ID

    ;

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Paul White (3/1/2010)


    The text file in the ZIP is empty 🙁

    The problem in your execution plan was the merge join. A poor choice by the optimizer.

    Paul

    Hmm, I thought that it was that it was doing the larger index first, when it should do the smaller first. But I guess that doesn't mean as much in a merge join, so ... hmm ...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Still a risk of the optimizer choosing a merge join with that one, Barry.

  • RBarryYoung (3/1/2010)


    Hmm, I thought that it was that it was doing the larger index first, when it should do the smaller first. But I guess that doesn't mean as much in a merge join, so ... hmm ...

    It means nothing with a one-to-one merge join, given pre-sorted inputs, as we have here.

    The optimizer is seriously under-estimating the cost of scanning that clustered index.

    Loop join is optimal with small inputs (not necessarily small tables).

    The twenty rows coming from my non-clustered index needs to be the outer (driving) input.

    Twenty seeks into the clustered index shouldn't take too long, even with huge LOBs off-row.

    Paul

  • Paul White (3/1/2010)


    Still a risk of the optimizer choosing a merge join with that one, Barry.

    True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (3/1/2010)


    Paul White (3/1/2010)


    Still a risk of the optimizer choosing a merge join with that one, Barry.

    True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.

    Indeed. In general I agree. My concern is that this thread needs a 'quick win' that is guaranteed. We can tweak later. We already know the optimizer has good statistics on the columns concerned (indexes rebuilt), so this is just a case of it getting things wrong. Maybe is won't choose a merge join for your query. This time. 😀

    Paul

  • Paul White (3/1/2010)


    ...

    The optimizer is seriously under-estimating the cost of scanning that clustered index.

    ...

    Yes, that's the big question here, why is it doing that? It's RowCount estimate is 66 for a Clustered Index scan with No Predicate on a table with 15,020 rows. Something is not right there. Sounds like bad statistics, but I don't see the little warning icon you usually get for that.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Paul White (3/1/2010)


    RBarryYoung (3/1/2010)


    Paul White (3/1/2010)


    Still a risk of the optimizer choosing a merge join with that one, Barry.

    True, but I prefer to try to get the optimizer to "do the right thing" with using hints to force it if I can. That way, it still has a chance of adapting to future changes in statistical counts and distributions.

    Indeed. In general I agree. My concern is that this thread needs a 'quick win' that is guaranteed. We can tweak later. ...

    Paul

    Oops, I must have missed that. Sorry.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (3/1/2010)


    Oops, I must have missed that. Sorry.

    That's just my view - I may be wrong about it. Never any need to apologise.

    It's 7:30am and I need to go very shortly.

    I look forward to catching up on this later.

    I'll leave it in your capable hands.

  • Last point: I have just looked at the what-about execution plan. Here (with 500 set as the limit, and no clustered index) the optimizer chooses a much better plan. I would expect it to be very much faster than the others. The clustered index was dropped for this one, so no merge join was possible (there's no covering n-c index for the final SELECT). Re-instating the clustered index and using my last posted query should be just as fast, if not slightly faster (fetching row data via the clustered index should be a bit quicker than the RID lookups). I doubt there would be very much in it for twenty rows though.

    Paul

  • Dear Barry, the exec plan for your suggestion is the same as NewExec I just posted.

    The approach is ok, but it's making a clust-ix-scan that scans through the first 15.000 records based on your example.

    But if I add an ORDER BY num.IXX ASC as Paul suggested, the exec plan changes the large scan for a seek operation to get the exact 20 required rows.

    I haven't run the test to see if it gets any faster, will do in a couple hours and post back.

    Paul, if nvarchar-max columns are stored off row as you said and as explained in the documentation, why would the clustered index behave in such a way ? The 3 nvarchar(255) fields add up to 0.4 kbs in the largest row. I can't find a way to get the table "large value types out of row" value, I know of the sp_tableoption but it's only for setting the value, perhaps it's storing in row but that's not the default.

  • agustingarzon (3/1/2010)


    Dear Barry, the exec plan for your suggestion is the same as NewExec I just posted.

    The approach is ok, but it's making a clust-ix-scan that scans through the first 15.000 records based on your example.

    But if I add an ORDER BY num.IXX ASC as Paul suggested, the exec plan changes the large scan for a seek operation to get the exact 20 required rows.

    I haven't run the test to see if it gets any faster, will do in a couple hours and post back.

    Paul, if nvarchar-max columns are stored off row as you said and as explained in the documentation, why would the clustered index behave in such a way ? The 3 nvarchar(255) fields add up to 0.4 kbs in the largest row. I can't find a way to get the table "large value types out of row" value, I know of the sp_tableoption but it's only for setting the value, perhaps it's storing in row but that's not the default.

    Just to clarify, are you saying that the ORDER BY fixes it?

    Hmm, maybe that's all it is then? The ORDER BY is supposed to be required for TOP, but they don't enforce that syntactically. So maybe the optimizer just gets dumb sometimes, without it?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply