"good" plan, low reads, but still slow...

  • Tuning woes... when decreasing logical reads actually increases execution time dramatically. BOTH execution times are unacceptable. The names of tables have been changed to protect the innocent. One table on the left side of the join, 4 on the right side of the join.

    I have a query where the plan looks good, the reads are low, but it is still a slow query. Other queries with similar plans and reads take much less time that this one. I don't see anything unusual when trac... err... watching with XEvents. Adding an index creates a "better" plan (note the air quotes) with index seeks instead of index scans on "Table_On_Left" and "Table_On_Right_A".

    Table_On_Right_D cannot be "improved" as it is already super saturated with indexes. Removing said beyond-help-table does not improve the execution time. Any ideas?

    First run, before adding any covering indexes:

    9,288 Logical Reads, ~5 sec

    Second Run, after adding covering indexes to "Table_On_Left" and "Table_On_Right_A". Execution plan looks the same, except for the obvious inclusion of seeks on the new indexes:

    3,909 Logical Reads, 8-11 sec

    First Run details:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 254 ms, elapsed time = 254 ms.

    (27,782 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Left'. Scan count 1, logical reads 3,461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_A'. Scan count 1, logical reads 2,548, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_B'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_C'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_D'. Scan count 1, logical reads 3,275, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 842 ms, elapsed time = 5,023 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Second Run details:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 296 ms, elapsed time = 307 ms.

    (27,782 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Left'. Scan count 1, logical reads 391, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_A'. Scan count 1, logical reads 239, physical reads 0, read-ahead reads 3, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_B'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_C'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table_On_Right_D'. Scan count 1, logical reads 3,275, physical reads 0, read-ahead reads 1423, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1,124 ms, elapsed time = 10,384 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    --edited to emphasize the part that is making me batty. What's going on in between the CPU and elapsed time?

  • I don't see any queries, nor table structures (with indexing), nor actual query plans. How are we supposed to help you here?

    You didn't mention it explicitly, so I will ask if you were monitoring for blocking during all of your runs. That is the simplest explanation.

    Another likely one is that you are using Scalar/Multi-Statement Table Valued UDFs. In that case, SSMS lies about statistics of the query (and it's actual operation under the covers).

    Anything else and we will need the information I asked for above.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It's not blocking, and the results are repeatable. I can't really get more detailed than the below sql, but it gives the gist. I was wondering if anyone had encountered something similar, as in, the query looks normal, usage looks normal, but can repeatedly get slow results on an otherwise low intensity query.

    tables are like this:

    Table Left ~23k rows, 28 MB data, 12 MB indexes

    Table A ~14k rows, 20 MB data, 45 MB indexes

    Table B ~5 rows (no k), 48 kb data, 40 kb indexes

    Table C ~ 3 rows (no k), 48 kb data, 40 kb indexes

    Table D ~410k rows, 163 MB data, 1.2 GB indexes

    Here's the outline:

    SELECT DISTINCT A.KeyRightA -- int

    , A.Status as a_status -- int

    , B.KeyRightB -- int

    , C.Status as c_status --varchar

    , L.DescriptiveColumn --varchar

    , A.AnotherInt

    , L.Key -- int

    , rtrim(A.AnotherVarchar) as [trimmed_varchar_for_rbar?] --took out the trim, doesn't mitigate execution time

    , CASE A.AnotherNotherInt WHEN 0 THEN 2 WHEN 4 THEN 3 ELSE 1 END as some_arbitrary_case_statement

    -- , a few more columns

    FROM Table_On_Left as L

    LEFT JOIN Table_On_Right_A as A ON L.KeyLeft = A.KeyLeft

    LEFT JOIN Table_On_Right_B as B ON A.KeyRightA = B.KeyRightA

    LEFT JOIN Table_On_RightC as C ON A.KeyRightA = C.KeyRightA

    LEFT JOIN Table_On_Right_D as D on L.KeyRightD = D.KeyRightD

    WHERE L.SomeField < @int

  • Parallelism? Parameter sniffing?

  • I removed RTrim and the CASE statement just in case, and substituted a literal in place of the variable.

    That leaves just columns' values, and left joins.

    Before and after the change, the optimizer doesn't seem to think parallelism is necessary.

    How would I detect parameter sniffing?

  • Ok, I've attached the scrubbed execution plan.

  • I don't think you actually attached the plan. Oh, it is up on OP...

    OK, so no UDFs it seems. No blocking. Are plans the SAME just with different reads? I only see one plan attached.

    It seems that you are getting scans and hashes. That could mean CPU/memory grant/spill-to-tempdb issues. How is CPU on the server?

    Estimated vs actual rowcounts are pretty good for such a query.

    Also, you have some FAT *** DATA!! <28K rows hitting 110MB in size! And a 259MB memory grant.

    Run sp_whoisactive during both executions and see what waits pop out (if any).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • CPU seemed fine I thought, but I'll take a closer look at that.

    It's not spilling to tempdb (unless it can spill without showing up in the plan -- please tell me that can't happen).

    Ok, I did manage to catch this wait with sp_whoisactive, one of the multiple times I tried:

    (155ms)ASYNC_NETWORK_IO

    Good catch on the memory grant, don't know why I missed that, I bet that's part of it, too.

  • Ickier (3/3/2014)


    CPU seemed fine I thought, but I'll take a closer look at that.

    It's not spilling to tempdb (unless it can spill without showing up in the plan -- please tell me that can't happen).

    Ok, I did manage to catch this wait with sp_whoisactive, one of the multiple times I tried:

    (155ms)ASYNC_NETWORK_IO

    Good catch on the memory grant, don't know why I missed that, I bet that's part of it, too.

    I honestly don't know if all spools to disk are presented in the graphical execution plan - or even in the MUCH more detailed XML plan for that matter.

    ASIO wait is almost always the data spooling back to the client (SSMS in this case, which is a nasty "send me one row at a time" consumer).

    SPEAKING of which ... that massive data size - how about you declare appropriate variables for every field and then SELECT each output field into it's variable? Time that. That will let you know if it sending that massive data to the client is some of the problem.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ickier (3/3/2014)

    SELECT DISTINCT A.KeyRightA -- int

    , A.Status as a_status -- int

    , B.KeyRightB -- int

    , C.Status as c_status --varchar

    , L.DescriptiveColumn --varchar

    , A.AnotherInt

    , L.Key -- int

    , rtrim(A.AnotherVarchar) as [trimmed_varchar_for_rbar?] --took out the trim, doesn't mitigate execution time

    , CASE A.AnotherNotherInt WHEN 0 THEN 2 WHEN 4 THEN 3 ELSE 1 END as some_arbitrary_case_statement

    -- , a few more columns

    FROM Table_On_Left as L

    LEFT JOIN Table_On_Right_A as A ON L.KeyLeft = A.KeyLeft

    LEFT JOIN Table_On_Right_B as B ON A.KeyRightA = B.KeyRightA

    LEFT JOIN Table_On_RightC as C ON A.KeyRightA = C.KeyRightA

    LEFT JOIN Table_On_Right_D as D on L.KeyRightD = D.KeyRightD

    WHERE L.SomeField < @int

    This outline is missing one significant information.

    For each and every JOIN there is also predicate UbiquitousTinyIntColum < 2 and few extra too. This predicate is used for index scans and index seeks for all those tables and this is the reason why you get hash joins which are joining then with keys.

    Without table structures I am not able to help more.

  • Change the key column order of the index nci_SalesDistrict so that UbiquitousTinyIntColum is the leading edge, followed by keysalesdistrict.

    It will probably give you seeks. Better still, data should come off ordered, which will *encourage* a nested loops join between A and D, and then between the product of these and table_on_left.

    You may also have to change the index on Table_on_right_A so that at least the first two key columns are in this same order.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for all the help everyone, I realize there were and are limitations to how much you can help with the information being necessarily masked.

    An interesting note, thanks to TheSQLGuru, I tried plopping the results into a new table.

    When I did the SELECT INTO, that part happened in ~1 sec.

    When I then Select out of that table, I still get the wonky execution time, even if I place a unique clustered index on the table with keys matching the query's order by clause. Maybe there's something more systemic here. Not sure. Other DBA coworkers are now looking at this with me, like doctors gathered around a patient with unusual symptoms.

    I'll get back to this and display our utterly amazing solution if we ever find one. (If we don't we'll still get back to this thread)

    Edit: There's already an index on SalesDistrict that matches that description. Forcing the query to choose it doesn't help. I've even removed that table from the query and it doesn't change the execution time. Thanks though.

    Thanks again all!

  • Ickier (3/4/2014)


    Thanks for all the help everyone, I realize there were and are limitations to how much you can help with the information being necessarily masked.

    An interesting note, thanks to TheSQLGuru, I tried plopping the results into a new table.

    When I did the SELECT INTO, that part happened in ~1 sec.

    When I then Select out of that table, I still get the wonky execution time, even if I place a unique clustered index on the table with keys matching the query's order by clause. Maybe there's something more systemic here. Not sure. Other DBA coworkers are now looking at this with me, like doctors gathered around a patient with unusual symptoms.

    I'll get back to this and display our utterly amazing solution if we ever find one. (If we don't we'll still get back to this thread)

    Edit: There's already an index on SalesDistrict that matches that description. Forcing the query to choose it doesn't help. I've even removed that table from the query and it doesn't change the execution time. Thanks though.

    Thanks again all!

    I didn't tell you to put it into a table, but rather into variables like this:

    SELECT @field1var = field1, @field2var = field2 ...

    But given what you did we now have a very clear culprit: the volume of data getting to your SSMS and being presented there. Possible root causes:

    1) sql server network bottleneck

    2) network bottleneck between server and your desktop

    3) network bottleneck on your desktop NIC

    4) your machine sucking wind

    5) SSMS sucking wind

    I am leaning heavily towards 4 and/or 5. ๐Ÿ˜€ Seriously though, if my math is correct that is 4151 BYTES PER ROW (110MB for 27782 rows)!!! You cannot be successful with that kind of bloat, I don't care how big your server is.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply