Strange performance boost by query rewrite

  • Hi folks.

    Being a complete newbie in sql query optimization, I'd like some help in understanding the following. Imagine the following scenario:

    -- Table of non-overlapping integer intervals. Some 200.000 rows in here.

    -- Primary Key (clustered index) on ivFrom-column.

    CREATE TABLE Intervals (

    ivFrom int NOT NULL,

    ivTo int NOT NULL,

    ivCode char(4) NOT NULL

    )

    -- Table of data where the "dValue" field may or may not lie in an interval defined in above table. Some 2.000.000 rows here.

    -- Primary Key (clustered index) on dKey-column.

    -- Non-clustered index on dValue-column.

    CREATE TABLE Data (

    dKey int NOT NULL

    dValue int NOT NULL

    )

    I am able to perform the following query, to count the number of entries in the data-table, where the dValue field lies in a valid interval, within fractions of a second:

    SELECT COUNT(*)

    FROM Data

    INNER JOIN Intervals ON dValue BETWEEN ivFrom AND ivTo

    However, when I try to query the invalid entries in the data-table, the query performs horribly (I terminated execution after it had run for more than 10 minutes, without having returned a single row):

    -- Query A

    SELECT Data.*

    FROM Data

    LEFT JOIN Intervals ON dValue BETWEEN ivFrom AND ivTo

    WHERE Intervals.ivFrom IS NULL

    Here's the funny thing. Now rewrite the above query to this, which should return the exact same result:

    -- Query B

    SELECT b.*

    FROM Data a

    INNER JOIN Intervals ON a.dValue BETWEEN ivFrom AND ivTo

    RIGHT JOIN Data b ON a.dKey = b.dKey

    WHERE a.dKey IS NULL

    ...and now I get the entire result set in a few seconds. I really don't understand how adding more complexity to the query (joining a 3rd table), improves the performance so drastically. What am I missing here - why does Query A perform so badly?

    Thanks in advance!

  • What do the execution plans look like?

    Is it a consistent difference? Run both 5 times, discard the first, average the remaining durations.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The differences are consistent. Query A never completes within the 10 minutes my patience lasts, while Query B always executes in about 3 seconds.

    Estimated execution plan for Query A:


    Execution plan for Query B:

  • Can you post the plans please, not pictures of them. Right-click, save as .sqlplan, zip, attach to your post.

    While QueryA is running, can you check what it's current wait type is?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Here you go. Thanks a lot for looking into this 🙂

    SELECT * FROM master.dbo.sysprocesses WHERE spid = 54

    spid kpid blocked waittype waittime lastwaittype ecid status

    ------ ------ ------- -------- -------------------- -------------------------------- ------ ------------------------------

    54 5472 0 0x00BB 652608 CXPACKET 0 suspended

    54 2160 0 0x0000 0 SOS_SCHEDULER_YIELD 1 runnable

    54 6000 0 0x0000 0 SOS_SCHEDULER_YIELD 2 runnable

  • There are several indexes are showing Scanning it must be index seek

    Index Seek is better than Index scan,work on it

    Regards,

    Syed Jahanzaib Bin Hassan

    MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog

    http://www.aureus-salah.com

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • Syed Jahanzaib Bin hassan (4/25/2011)


    There are several indexes are showing Scanning it must be index seek

    Index Seek is better than Index scan,work on it

    There's no way to get index seeks with the query he has. There's no SARGable predicate, hence one table at least must be scanned (the entire table is needed).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok, the CXPacket is a parallel skew wait. It's just the outer thread waiting for the two parallel branches to finish. Interesting that it's a scheduler yield that the other two have, that's usually indicative of high CPU usage.

    Can you try another query form for me, see how it performs...

    SELECT *

    FROM Data

    WHERE NOT EXISTS (SELECT 1 FROM Intervals WHERE dValue BETWEEN ivFrom AND ivTo)

    Exists usually performs better than joins, because SQL doesn't have to do the full join (as it does with the left join) it just has to check for matching rows.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • On a side note, I'd suggest taking your cost threshold for parallelism to a much higher number. It looks like it's at the default of 5. I usually bump mine up to 35 or so on OLTP systems.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks a lot for the responses so far, everyone.

    Using EXISTS does unfortunately not improve the performance. The execution time still exceeds my 10 minute patience limit, and the wait type and estimated execution plans are exactly the same as with Query A, above.

    Also, changing the cost threshold for parallelism to a higher number does not seem to make a difference in this case - but I must admit I have no clue what the option means, anyway.

  • Grant Fritchey (4/25/2011)


    ... I usually bump mine up to 35 or so on OLTP systems.

    Is that a general rule of thumb?

    Are there any consequences to that setting?

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I think it has to do with the position of the nested loop join in the plan, and the smaller number of rows in the outer table for the second query. Only a nested loop can handle a join without any equality, hence it's the join used, but it's terribly inefficient on larger row counts.

    On the second query, the number of rows in the outer table, and hence the number of executions on the inner table, is smaller, hence faster.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • David Burrows (4/25/2011)


    Grant Fritchey (4/25/2011)


    ... I usually bump mine up to 35 or so on OLTP systems.

    Is that a general rule of thumb?

    Are there any consequences to that setting?

    Not really. Since it's dependent on the cost estimates of execution plans, it's sort of a guess anyway. I've set it as low as 25 and as high as 50 on different servers. It really depends on the queries being run there and whether or not parallelism hurts them or helps them.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'd try adding IvTo to that clustered index on IvFrom. Guessing wildly, both versions of the query would take about the same time, and rather less than you're currently getting.

    Left join to a range without an index on a range bound is asking for trouble 😀


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hi Chris,

    Thanks for your suggestion, but unfortunately, adding ivTo to the clustered index, does not alter the execution plan or the performance at all. I also tried creating a separate non-clustered index on ivTo, but that didn't help either.

    This is a real brain teaser. There must be some techniques for optimizing range queries in general? Anyone got experience in that area?

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply