January 31, 2017 at 11:46 am
I would appreciate review and opinions of below issue. I believe issue is data related, but confused on how a good plan is generated every time using RECOMPILE hint and not after being flushed from cache.
VERSION: SQL 2012 SP3
OS: Windows Server 2012 R2
ISSUE: Bad query plan is produced by default on production server, whereas our Staging server produces an adequate plan. Both plans are almost identical with three braches. The good plan performs a Merge Inner Join with a Hash Match (Right Outer) Join (see attached img). The bad plan performs a Lazy Spool into a Nested Loops Inner Join with a Merge (Right Outer) Join. With the good plan, the query executes in under 20 seconds, but takes 1.5hrs with bad plan. Largest table, AgreementMaster has 28.55m records in Production, 28.67m records in Staging. The other two tables data differences is minor and they're not very large < 500k records. Query below is being executed as is against both servers in all testing. Comparison reveals zero differences in schema, indexing, instance version/patch, instance options, or database options. ANSI options are same since I'm executing query directly via SMSS. During testing, only 288 rows had changed in largest table since last stats update.
ANALYSIS
Bad plan is reproduced after flushing it from cache
Using hints RECOMPILE and OPTIMIZE FOR produces good plan
UPDATE STATS WITH FULLSCAN produces good plan
Staging db continues to produce good plan on another server. It's not environmental
NEXT STEP: Restore production db to another server. Drop statistics on all tables referenced by query. Allow SQL Server to recreate automatically. Test query.
QUERYDECLARE @SourceSystemID INT = 1044;
WITH prd AS ( SELECT a.PROD_GRP_CD ,
a.PlanCode ,
a.ProductKey ,
a.PROD_CD
FROM ( SELECT PlanCode ,
PROD_GRP_CD ,
ProductKey ,
PROD_CD ,
ROW_NUMBER() OVER ( PARTITION BY PlanCode ORDER BY PROD_CD ASC ) AS PC
FROM EDW.Product(NOLOCK)
WHERE SourceSystemKey = @SourceSystemID
AND IsActive = 1
) a
WHERE a.PC = 1)
SELECT
AM.AgreementMasterKey AS AgreementMasterKey
,PC.ProductKey AS ProductKey
,@SourceSystemID AS SourceSystemKey
,AM.SourceSystemIdentifier AS SourceSystemIdentifier
,CAST(ST.policy_number AS NVARCHAR(50)) AS AgreementNumber
,PC.PROD_CD AS ProductCode
,ST.BASE_COVERAGE_AMT AS Face_Value
,0 AS IndicatorCode
,CAST('Base' AS NVARCHAR(50)) AS IndicatorCodeDescription
,GetDate() AS LoadDate
,GetDate() AS ModifyDate
,1 AS IsActive
FROM stage.GENEXT_TOPS_POLICY_Persist ST WITH(NOLOCK)
INNER JOIN EDW.AgreementMaster AM WITH(NOLOCK) ON LTRIM(RTRIM(AM.AgreementNUmber)) = LTRIM(RTRIM(ST.policy_number))
AND AM.Sourcesystemkey = @SourceSystemID
AND AM.Isactive = 1
AND AM.ProductTypeCode = 1
LEFT JOIN prd PC WITH(NOLOCK) ON LTRIM(RTRIM(ST.PLAN_CODE)) = LTRIM(RTRIM(PC.PlanCode))
January 31, 2017 at 12:16 pm
Instead of posting pictures of the execution plans, save then as files and attach them. Also, that last outer join has no chance of being SARGable as both columns must first be converted by the formulas.
See the second link under "Helpful Links" in my signature line below for what we need to help troubleshoot performance problems.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2017 at 2:09 pm
In addition to Jeff's comments about functions around columns in WHERE clause (DEVASTATINGLY BAD, BTW, and possibly not even needed here depending on the actual data - I have seen that type of thing more than a few times where it wasn't actually needed but was KILLING the system):
1) This is not a sproc, it is ad hoc code. So I encourage you to look up how the optimizer uses variables in ad hoc code. Look at your estimated rows with the variable, with the code put in a sproc and the variable made a parameter instead, both of those with OPTION (RECOMPILE) and lastly with dynamic SQL. You will learn a LOT about how things work and why choices are made with this exercise.
2) Speaking of dynamic SQL, OPTION (RECOMPILE) can be "mostly" thought of as just that. Do the dynamic SQL and OPTION (RECOMPILE) estiamtes and query match up?
3) I have only ONCE in TWENTY YEARS of consulting come across a client that had a stage/test/qa server that was EXACTLY THE SAME as production (and that was a hedge fund with a pair of STUNNINGLY-powerful machines). So you are almost certainly comparing bananas and pomegranates (not even apples and oranges). Probably worse still because your databases aren't in sync either.
4) You REALLY need to get rid of those functions around columns in the WHERE clause. That is the single worst performance thing I have seen in 20+ years of consulting in aggregate. And believe me when I tell you that is really saying something!!! 😀
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 31, 2017 at 2:21 pm
Absolutely get rid of the LTRIM and RTRIM. Also, review all indexes on the tables, particularly making sure that you have the best clustered index on every (significant) table.
If these columns:
AM.AgreementNUmber
ST.policy_number
ST.PLAN_CODE
PC.PlanCode
really do need TRIMmed, do it once when the row is INSERTed; an AFTER INSERT trigger is an excellent method of doing that. If necessary, you can also do if those columns gets UPDATEd (but presumably those particular would never get UPDATEd anyway?!).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply