SQL behaving strangely on different servers

  • Hi

    Looking at the plans the slow plan is mainly using Nested Loops and the Fast plan uses more Hash Matches.

    In the slow plan there's a massive difference between many of the estimated and actual number of rows.  This is why the engine is choosing nested loops instead of hash matches. It seems likely that the stats are out of date. Update the stats, purge the plan and run it again.  Where the engine is joining two big data sets it should now be using a hash match not a nested loop.

    Cheers

    Alex

  • Also, just to point out the elephant in the room, the slow plan is using the new cardinality estimator, while the fast one is using the old cardinality estimator. 

    Try the slow query while forcing the use of the old cardinality estimator to see if the behaviors then match.

    Cheers!

  • alex.palmer - Wednesday, July 5, 2017 8:28 AM

    Hi

    Looking at the plans the slow plan is mainly using Nested Loops and the Fast plan uses more Hash Matches.

    In the slow plan there's a massive difference between many of the estimated and actual number of rows.  This is why the engine is choosing nested loops instead of hash matches. It seems likely that the stats are out of date. Update the stats, purge the plan and run it again.  Where the engine is joining two big data sets it should now be using a hash match not a nested loop.

    Cheers

    Alex

    I already did that as a first trial to fix the problem but it didn't fix it.
    What i did is change the sql to 
    with DI AS (
    SELECT DISTINCT GetDrug2DrugInteractions.DRUGINTERACTIONID, GetDrug2DrugInteractions.SEVERITYID,
    GetDrug2DrugInteractions.IMPLICITROUTEID, GetDrug2DrugInteractions.DSMROUTEID,INTERACTINGCPNUM FROM GetDrug2DrugInteractions
    WHERE GetDrug2DrugInteractions.CP_NUM IN (106) AND GetDrug2DrugInteractions.INTERACTINGCPNUM IN (4,27,44,60,95,221,270,340,350,392,452,538,565,640,672,798)
    )
    select DI.*, trade_drug.trade_name, trade_drug.trade_code FROM TradeName_GenericName INNER JOIN GenericName ON
    TradeName_GenericName.GenericName_ID = GenericName.GenericName_ID INNER JOIN trade_drug ON TradeName_GenericName.TradeName_ID = trade_drug.trade_code
    INNER JOIN DI ON GenericName.CPNUM = DI.INTERACTINGCPNUM inner join trade_drugParameters
    ON trade_drug.trade_code = trade_drugParameters.trade_code
    where (trade_drug.trade_code IN (161,193,223,227,327,415,860,1058,1170,1200,1214,1749,1986,2039,2041,2473))
    AND (GenericName.CPNUM > 0) AND isnull(trade_drug.inactive,0) = 0 and trade_drugParameters.hospitalid=22
    ORDER BY DI.SEVERITYID
    instead of
    Select DISTINCT GetDrug2DrugInteractions.DRUGINTERACTIONID, trade_drug.trade_name, GetDrug2DrugInteractions.SEVERITYID, trade_drug.trade_code, GetDrug2DrugInteractions.IMPLICITROUTEID, GetDrug2DrugInteractions.DSMROUTEID FROM TradeName_GenericName INNER JOIN GenericName On TradeName_GenericName.GenericName_ID = GenericName.GenericName_ID INNER JOIN trade_drug On TradeName_GenericName.TradeName_ID = trade_drug.trade_code INNER JOIN GetDrug2DrugInteractions On GenericName.CPNUM = GetDrug2DrugInteractions.INTERACTINGCPNUM inner join trade_drugParameters On trade_drugParameters.trade_code=trade_drug.trade_code WHERE (GetDrug2DrugInteractions.CP_NUM In (106)) And (GetDrug2DrugInteractions.INTERACTINGCPNUM In (4,27,44,60,95,221,270,340,350,392,452,538,565,640,672,798)) And (trade_drug.trade_code In (161,193,223,227,327,415,860,1058,1170,1200,1214,1749,1986,2039,2041,2473)) And (GenericName.CPNUM > 0) And ISNULL(trade_drugParameters.inactive,0) = 0 And trade_drugParameters.hospitalid= 22 ORDER BY GetDrug2DrugInteractions.SEVERITYID

    And that fixed the problem significantly , i did that by trying many tweaks to the sql but my question is why is SQL  Server handling the new SQL better than old one although theoretically they both should be the same?

    Thanks for your reply.

  • Jacob Wilkins - Wednesday, July 5, 2017 8:44 AM

    Also, just to point out the elephant in the room, the slow plan is using the new cardinality estimator, while the fast one is using the old cardinality estimator. 

    Try the slow query while forcing the use of the old cardinality estimator to see if the behaviors then match.

    Cheers!

    Thank you for your reply but could you please elaborate more, not sure i quite understand what you mean.

    Regards
    Nader

  • Hi Nader

    It's always worth remembering that it's execution plans that do the work on SQL not SQL code. The same code can run in completely different ways depending on decisions the SQL engine makes.

    What jumped out at me when I looked at the plans you originally posted was the slow plan was using a lot more nested loops for joins. These are only efficient when at least one of your data sets are small. In the case of the slow plan, large data sets were being joined using nested loops This will always be slow.

    The question then becomes how do I get SQL to produce a plan with appropriate joins? The first thing you should do it remove the offending plan from cache DBCC freeproccache(plan_handle).

    Stats is a possible issue, it could be parameter sniffing (optimise for unknown may help here) or the cardinality optimiser as Jacob pointed out.

    You have an obvious issue here with the problematic joins. You just need to find a solution now.

    Cheers

    Alex

  •  (optimise for unknown may help here)

    alex.palmer - Thursday, July 6, 2017 1:54 AM

    Hi Nader

    It's always worth remembering that it's execution plans that do the work on SQL not SQL code. The same code can run in completely different ways depending on decisions the SQL engine makes.

    What jumped out at me when I looked at the plans you originally posted was the slow plan was using a lot more nested loops for joins. These are only efficient when at least one of your data sets are small. In the case of the slow plan, large data sets were being joined using nested loops This will always be slow.

    The question then becomes how do I get SQL to produce a plan with appropriate joins? The first thing you should do it remove the offending plan from cache DBCC freeproccache(plan_handle).

    Stats is a possible issue, it could be parameter sniffing (optimise for unknown may help here) or the cardinality optimiser as Jacob pointed out.

    You have an obvious issue here with the problematic joins. You just need to find a solution now.

    Cheers

    Alex

    Can you please explain what do you mean by this  (optimise for unknown may help here) 
    Regards
    Nader

  • Optimise for unknown query hint means when the SQL engine compiles an execution plan it ignores the parameters in the code and creates a plan from average value. This creates stable plans. It doens;t always create the type og plans you want but it's worth giving it a try and seeing te sort of plan you get.

    here's an article

    https://www.brentozar.com/archive/2013/06/optimize-for-unknown-sql-server-parameter-sniffing/

  • alex.palmer - Thursday, July 6, 2017 4:19 AM

    Optimise for unknown query hint means when the SQL engine compiles an execution plan it ignores the parameters in the code and creates a plan from average value. This creates stable plans. It doens;t always create the type og plans you want but it's worth giving it a try and seeing te sort of plan you get.

    here's an article

    https://www.brentozar.com/archive/2013/06/optimize-for-unknown-sql-server-parameter-sniffing/

    Thank you alex

  • nadersam - Thursday, July 6, 2017 1:37 AM

    Jacob Wilkins - Wednesday, July 5, 2017 8:44 AM

    Also, just to point out the elephant in the room, the slow plan is using the new cardinality estimator, while the fast one is using the old cardinality estimator. 

    Try the slow query while forcing the use of the old cardinality estimator to see if the behaviors then match.

    Cheers!

    Thank you for your reply but could you please elaborate more, not sure i quite understand what you mean.

    Regards
    Nader

    Starting in SQL Server 2014, there were major changes to the cardinality estimator, which is what estimates the number of rows processed by each part of a query plan, which is in turn used to estimate costs for the possible query plans.

    While the new one is an improvement for most queries, there are queries that regress under the new cardinality estimator.

    Your two query plans show that the faster plan is using the old one (CardinalityEstimationModelVersion="70" in the plan), while the slower plan is using the new one (CardinalityEstimationModelVersion="130" in the plan).

    When you see a regression in performance for a query and there is a difference in the cardinality estimator version between the two, that's something you'll want to confirm/disconfirm as a cause early in the troubleshooting, as that's a hugely relevant difference.

    On the newer server, you have some options.

    For just testing to see if the new estimator is the main difference, you can run the query with OPTION (QUERYTRACEON 9481), which will force the use of the old cardinality estimator for just that run of the query.

    The compatibility level of the database can also be used to control this when moving to new instances to minimize surprises. If you keep the compatibility level <120, then queries by default won't use the new cardinality estimator (that behavior can be overridden with a trace flag, 2312).

    In your case, I'd definitely test the query on the new server with OPTION (QUERYTRACEON 9481) to see if you get the old behavior again, since the difference in cardinality estimator is a huge difference between the posted plans.

    Cheers!

  • Apart from the statistics , I would also advise check the collation settings of both these servers.

  • Arsh - Wednesday, August 16, 2017 3:43 AM

    Apart from the statistics , I would also advise check the collation settings of both these servers.

    Thanks for your reply, will check that but could you please tell me how could that affect query performance?

  • In the slow query plan , there's a huge difference between the Estimated and Actual Rows unlike those in the Fast query plan you attached . Hence Stale Stats looks a definite issue here .

  • nadersam - Wednesday, August 16, 2017 3:48 AM

    Arsh - Wednesday, August 16, 2017 3:43 AM

    Apart from the statistics , I would also advise check the collation settings of both these servers.

    Thanks for your reply, will check that but could you please tell me how could that affect query performance?

    Not sure but it helped in one situation where everything else was fine and queries lost the through put only on the new system.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply