Increasing CPUs hurts Perf

  • Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Looks like an estimated plan...

    However 4 MLN rows from both tables is a huge load. I suspect it would run much better with a loop join. Can you force it and post the actual execution plan for the forced join query as well?

    I never used hints, ever. There's always (99.99999%) another to get there. And we're far from having to use that option.

    Nor do I. However, they can be useful while tuning to understand why the optimizer chooses one join strategy against another.

    -- Gianluca Sartori

  • as2higpark (7/5/2011)


    Ninja,

    Thanks and I love it. The new plan is attached.

    Hmm, where's the plan with the union all version?

    How many rows do you have in the table where candidatew0_1_.customText2=''. Remember that around 0.3% the server might start choosing a scan instead of a seek. Also the left join is NOT on that column, it's on the other key so I don't see any better plan than a scan here.

    A seek might work if you have an INNER join and only a few 100 rows, maybe 1000 in the first table.

    My idea to split to a union all query should still be the fastest option you can have (the or in the where will also almost always force scans).

    Also if this is super duper critical we can explore the indexed view option. There's no reason this query can't run fast!

  • Gianluca Sartori (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Looks like an estimated plan...

    However 4 MLN rows from both tables is a huge load. I suspect it would run much better with a loop join. Can you force it and post the actual execution plan for the forced join query as well?

    I never used hints, ever. There's always (99.99999%) another to get there. And we're far from having to use that option.

    Nor do I. However, they can be useful while tuning to understand why the optimizer chooses one join strategy against another.

    Ok, I didn't get the test it option and then we'll try to get there without the hint.

    Also AFAIK a loop join is NOT the best option with that many rows. IIRC HASH join is (67% sure)... don't feel like re-reading Grant's book on the topic for this discussion!

    Anyone can confirm or infirm this info?

  • Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Looks like an estimated plan...

    However 4 MLN rows from both tables is a huge load. I suspect it would run much better with a loop join. Can you force it and post the actual execution plan for the forced join query as well?

    I never used hints, ever. There's always (99.99999%) another to get there. And we're far from having to use that option.

    Nor do I. However, they can be useful while tuning to understand why the optimizer chooses one join strategy against another.

    Ok, I didn't get the test it option and then we'll try to get there without the hint.

    Also AFAIK a loop join is NOT the best option with that many rows. IIRC HASH join is (67% sure)... don't feel like re-reading Grant's book on the topic for this discussion!

    Anyone can confirm or infirm this info?

    Sounds correct. Nested loops are better when there is a small number of rows on one side of the join but in this case you have 4MM on both sides. The other option is to do a sort and go for a Merge join but that was likely dismissed as WAY too expensive.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Ninja's_RGR'us (7/5/2011)


    Hmm, where's the plan with the union all version?

    ...Sorry if I wasn't clear, if we can change the code, we would not even need to do it as a UNION, we could have the code simply run a single lookup or another single lookup, determined by the input data, so each would be a single lookup and a seek.

    I have run both and they are both drastically faster.

    My only thing that I would want to prove (unable to do) is how this would impact the overall performance. I believe that this query is true root cause, but we are unable to prove without changing the integration.

  • opc.three (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Gianluca Sartori (7/5/2011)


    Looks like an estimated plan...

    However 4 MLN rows from both tables is a huge load. I suspect it would run much better with a loop join. Can you force it and post the actual execution plan for the forced join query as well?

    I never used hints, ever. There's always (99.99999%) another to get there. And we're far from having to use that option.

    Nor do I. However, they can be useful while tuning to understand why the optimizer chooses one join strategy against another.

    Ok, I didn't get the test it option and then we'll try to get there without the hint.

    Also AFAIK a loop join is NOT the best option with that many rows. IIRC HASH join is (67% sure)... don't feel like re-reading Grant's book on the topic for this discussion!

    Anyone can confirm or infirm this info?

    Sounds correct. Nested loops are better when there is a small number of rows on one side of the join but in this case you have 4MM on both sides. The other option is to do a sort and go for a Merge join but that was likely dismissed as WAY too expensive.

    Ya that's exactly why I was thinking about filtering has much as I can in BOTH tables before going to the join (which accounts for 80+% of the load).

    The 1st value is known because it's the parameter value so you can whack that out of the query (hence my union idea). Then you can change the left join to inner join or exists or even go through a temp table... after filtering from the base table.

    So worst comes to worst you have one 4M rows table joining to 10K rows as a subset. Should be WAYYYYYYYYYYYY less expansive. Also you have much better stats (especially with temp table option... might be overkill here but I'd certainly give that a try. On my reports it usually beats the hell out of multi table joins even with way less data than this).

  • as2higpark (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Hmm, where's the plan with the union all version?

    ...Sorry if I wasn't clear, if we can change the code, we would not even need to do it as a UNION, we could have the code simply run a single lookup or another single lookup, determined by the input data, so each would be a single lookup and a seek.

    I have run both and they are both drastically faster.

    My only thing that I would want to prove (unable to do) is how this would impact the overall performance. I believe that this query is true root cause, but we are unable to prove without changing the integration.

    I was thinking that also... but the IsDeleted is only in 1 of the table, so you can't avoid some sort of join... that would certainly be the fasted of all if you could do that.

    Another option would be to move the deleted data to archive right on the spot, but that would certainly be a major app rewrite...

  • Ninja's_RGR'us (7/5/2011)


    Also AFAIK a loop join is NOT the best option with that many rows. IIRC HASH join is (67% sure)... don't feel like re-reading Grant's book on the topic for this discussion!

    You're right, but, looking at the estimated row counts (39, after the filter), if there was a covering index to seek on, that would be the best choice IMHO.

    -- Gianluca Sartori

  • The newPlan you attached still has a degree of parallelism of 24. That is still going to hurt the performance of your server. Can you post the plan for the SQL of for which you think the application would be rewritten to execute? If you don't get rid of the query using 24 threads I think you are still going to have this problem. How much ram did you say you have, could this be that you can't keep this data in the buffer pool and it keeps fetching off of disc? Can you run this with SET STATISTICS IO ON? I would want to see how many of these reads are from logical and how much is from physical (hopefully if its run often 0 from physical) because otherwise you are paying the price of fetching off the disk everytime, and maybe that's why SQL Server thinks breaking up the IO fetch to 24 cores is more efficient.

  • Gianluca Sartori (7/5/2011)


    Ninja's_RGR'us (7/5/2011)


    Also AFAIK a loop join is NOT the best option with that many rows. IIRC HASH join is (67% sure)... don't feel like re-reading Grant's book on the topic for this discussion!

    You're right, but, looking at the estimated row counts (39, after the filter), if there was a covering index to seek on, that would be the best choice IMHO.

    The only way to have a covering index here is to whack the left join to inner join and use an indexed view (or change archive plan).

    Since that's probably not possible we're down to using union all! 😉

  • I can't help thinking about how this query might have been faster on the old hardware given the same plan (can you confirm the plan looked the same on the old hardware?). You're essentially hash joining ~43MB of data with ~63MB of data. The plan is parallelized however that only accounts for ~10% of the cost. Getting the data from disk/memory is 48% and doing the hash match is 40%. Same presented storage so can we dismiss that as a difference? Doing the hash match seems like the point where the rubber meets the road in terms of the CPU so the drop in speed per single CPU could be hurting you there. Just thinking out loud...feel free to throw stuff if I am off the grid (happens).

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • patrick.doran (7/5/2011)


    The newPlan you attached still has a degree of parallelism of 24. That is still going to hurt the performance of your server. Can you post the plan for the SQL of for which you think the application would be rewritten to execute? If you don't get rid of the query using 24 threads I think you are still going to have this problem. How much ram did you say you have, could this be that you can't keep this data in the buffer pool and it keeps fetching off of disc? Can you run this with SET STATISTICS IO ON? I would want to see how many of these reads are from logical and how much is from physical (hopefully if its run often 0 from physical) because otherwise you are paying the price of fetching off the disk everytime, and maybe that's why SQL Server thinks breaking up the IO fetch to 24 cores is more efficient.

    +1

    Nice catch.

    Maybe using maxdop 8 (or whatever the best number is for you) should drop the cpackets waits for sure... which is what the original question was a couple ions ago.

    However I'd still tune this query to death before touching the maxdop...

  • Thanks guys, I agree the biggest mystery to me is why it is worse on more (granted slower) CPUs. This code has not changed and I cannot unfortunate see the plan on the old hardware (already DeComm'd).

    I have tried the server level MAXDOP of 0,1,4,8,10,20 and have seen no changes.

  • Current query with SET STATISTICS IO ON:

    Table 'BH_UserWorkHistory'. Scan count 14, logical reads 5733, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'BH_UserWorkHistoryCustom'. Scan count 25, logical reads 6339, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  • as2higpark (7/5/2011)


    Current query with SET STATISTICS IO ON:

    Table 'BH_UserWorkHistory'. Scan count 14, logical reads 5733, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'BH_UserWorkHistoryCustom'. Scan count 25, logical reads 6339, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    That doesn't look so bad. I have tones of query with more reads than that that run sub-second.

    Any update on the union all version?

Viewing 15 posts - 61 through 75 (of 94 total)

You must be logged in to reply to this topic. Login to reply