Long running query on only one server

  • I have a query that ran in 3 seconds on a 2008 server, then we moved the database and everything to a different 2008R2 server (backup, restore) and now the query takes 2 hours and 15 minutes to run. I'm trying to figure out what could cause the change in performance. The servers have similar configurations (4 processors, 32GB RAM, etc.). The only difference is that the original server was physical and the new server is VM. The new VM server has been running quite a while and has no other issues. Any suggestions on what to look for or what could be happening are very welcome.

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

  • Please add more info about the particular query.

  • SQLJocky (3/8/2012)


    I have a query that ran in 3 seconds on a 2008 server, then we moved the database and everything to a different 2008R2 server (backup, restore) and now the query takes 2 hours and 15 minutes to run. I'm trying to figure out what could cause the change in performance. The servers have similar configurations (4 processors, 32GB RAM, etc.). The only difference is that the original server was physical and the new server is VM. The new VM server has been running quite a while and has no other issues. Any suggestions on what to look for or what could be happening are very welcome.

    I'm assuming you've done the standard, check the execution plans, update statistics, etc?

    If so, and you're relatively sure your problem is VM particular and not that the database just needs some upkeep, make sure your data is being stored in RAW data partitions. I've seen that seriously hose up a VM system. Otherwise, you need to get in there with perfmon and find your bottleneck.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Aneesh Soman (3/8/2012)


    Please add more info about the particular query.

    SELECT mbm1.BMWHS

    ,mbm1.BPROD AS BPROD_L1

    ,iimx.IPROD AS BCHLD

    ,iimx.IDESC

    ,iimx.IITYP

    ,iimx.IUMS

    ,(cmf.Ingredients+cmf.Packaging) AS COST

    ,((ISNULL(mbm1.BQREQ,1))/ISNULL(cic1.ICBTCH,1))

    *((ISNULL(mbm2.BQREQ,1))/ISNULL(cic2.ICBTCH,1))

    *((ISNULL(mbm3.BQREQ,1))/ISNULL(cic3.ICBTCH,1))

    *((ISNULL(mbm4.BQREQ,1))/ISNULL(cic4.ICBTCH,1))

    *((ISNULL(mbm5.BQREQ,1))/ISNULL(cic5.ICBTCH,1))

    *((ISNULL(mbm6.BQREQ,1))/ISNULL(cic6.ICBTCH,1))

    AS QTYREQ_NOSCRAP

    ,((ISNULL(mbm1.BQREQ,1)*ISNULL(mbm1.BMSCP,1))/ISNULL(cic1.ICBTCH,1))

    *((ISNULL(mbm2.BQREQ,1)*ISNULL(mbm2.BMSCP,1))/ISNULL(cic2.ICBTCH,1))

    *((ISNULL(mbm3.BQREQ,1)*ISNULL(mbm3.BMSCP,1))/ISNULL(cic3.ICBTCH,1))

    *((ISNULL(mbm4.BQREQ,1)*ISNULL(mbm4.BMSCP,1))/ISNULL(cic4.ICBTCH,1))

    *((ISNULL(mbm5.BQREQ,1)*ISNULL(mbm5.BMSCP,1))/ISNULL(cic5.ICBTCH,1))

    *((ISNULL(mbm6.BQREQ,1)*ISNULL(mbm6.BMSCP,1))/ISNULL(cic6.ICBTCH,1))

    AS QTYREQ

    ,((ISNULL(mbm1.BQREQ,1)*ISNULL(mbm1.BMSCP,1))/ISNULL(cic1.ICBTCH,1))

    *((ISNULL(mbm2.BQREQ,1)*ISNULL(mbm2.BMSCP,1))/ISNULL(cic2.ICBTCH,1))

    *((ISNULL(mbm3.BQREQ,1)*ISNULL(mbm3.BMSCP,1))/ISNULL(cic3.ICBTCH,1))

    *((ISNULL(mbm4.BQREQ,1)*ISNULL(mbm4.BMSCP,1))/ISNULL(cic4.ICBTCH,1))

    *((ISNULL(mbm5.BQREQ,1)*ISNULL(mbm5.BMSCP,1))/ISNULL(cic5.ICBTCH,1))

    *((ISNULL(mbm6.BQREQ,1)*ISNULL(mbm6.BMSCP,1))/ISNULL(cic6.ICBTCH,1))

    *(cmf.Ingredients+cmf.Packaging) AS EXTCOST

    FROM (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999 AND mbm.BPROD = '14324' AND mbm.BMWHS ='5AF') mbm1

    LEFT JOIN (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999) mbm2

    ON mbm1.BCHLD = mbm2.BPROD AND mbm1.BMWHS = mbm2.BMWHS

    LEFT JOIN (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999) mbm3

    ON mbm2.BCHLD = mbm3.BPROD AND mbm2.BMWHS = mbm3.BMWHS

    LEFT JOIN (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999) mbm4

    ON mbm3.BCHLD = mbm4.BPROD AND mbm3.BMWHS = mbm4.BMWHS

    LEFT JOIN (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999) mbm5

    ON mbm4.BCHLD = mbm5.BPROD AND mbm4.BMWHS = mbm5.BMWHS

    LEFT JOIN (SELECT * FROM stg.Table1 mbm WHERE mbm.BMBOMM=' ' AND mbm.BDDIS =99999999) mbm6

    ON mbm5.BCHLD = mbm6.BPROD AND mbm5.BMWHS = mbm6.BMWHS

    LEFT JOIN stg.Table2 cic1

    ON mbm1.BMWHS=cic1.ICFAC AND mbm1.BPROD=cic1.ICPROD

    LEFT JOIN stg.Table2 cic2

    ON mbm2.BMWHS=cic2.ICFAC AND mbm2.BPROD=cic2.ICPROD

    LEFT JOIN stg.Table2 cic3

    ON mbm3.BMWHS=cic3.ICFAC AND mbm3.BPROD=cic3.ICPROD

    LEFT JOIN stg.Table2 cic4

    ON mbm4.BMWHS=cic4.ICFAC AND mbm4.BPROD=cic4.ICPROD

    LEFT JOIN stg.Table2 cic5

    ON mbm5.BMWHS=cic5.ICFAC AND mbm5.BPROD=cic5.ICPROD

    LEFT JOIN stg.Table2 cic6

    ON mbm6.BMWHS=cic6.ICFAC AND mbm6.BPROD=cic6.ICPROD

    JOIN stg.Table3 iimx

    ON COALESCE(mbm6.BCHLD,mbm5.BCHLD,mbm4.BCHLD,mbm3.BCHLD,mbm2.BCHLD,mbm1.BCHLD) = iimx.IPROD

    JOIN (SELECT cmf.CFPROD, iim.IDESC, iim.IITYP, iim.ICLAS, iic.ICDES, iim.IUMS, iim.IWGHT, cmf.CFFAC, ZMF.MFDESC, cmf.CFCSET

    ,SUM(CASE WHEN CFCBKT = 1 THEN CFTLVL + CFPLVL ELSE 0 END) AS Ingredients

    ,SUM(CASE WHEN CFCBKT = 2 THEN CFTLVL + CFPLVL ELSE 0 END) AS Packaging

    ,SUM(CASE WHEN CFCBKT = 3 THEN CFTLVL + CFPLVL ELSE 0 END) AS Resale

    ,SUM(CASE WHEN CFCBKT = 4 THEN CFTLVL + CFPLVL ELSE 0 END) AS OutsideProc

    ,SUM(CASE WHEN CFCBKT = 10 THEN CFTLVL + CFPLVL ELSE 0 END) AS DirectLabor

    ,SUM(CASE WHEN CFCBKT = 20 THEN CFTLVL + CFPLVL ELSE 0 END) AS IndirectLabor

    ,SUM(CASE WHEN CFCBKT = 21 THEN CFTLVL + CFPLVL ELSE 0 END) AS VariableOH

    ,SUM(CASE WHEN CFCBKT = 30 THEN CFTLVL + CFPLVL ELSE 0 END) AS FixedOH

    ,SUM(CASE WHEN CFCBKT = 31 THEN CFTLVL + CFPLVL ELSE 0 END) AS RMPlantWhseStorage

    ,SUM(CASE WHEN CFCBKT = 32 THEN CFTLVL + CFPLVL ELSE 0 END) AS InterplantFreight

    ,SUM(CASE WHEN CFCBKT NOT IN (0, 1, 2, 3, 4, 10, 20, 21, 30, 31, 32)

    THEN CFTLVL + CFPLVL ELSE 0 END) AS AllOther

    ,SUM(CASE WHEN CFCBKT = 0 THEN CFTLVL + CFPLVL ELSE 0 END) AS Total

    ,SUM(CASE WHEN CFCBKT = 1 THEN CFTLVL ELSE 0END) AS TL_Ingredients

    ,SUM(CASE WHEN CFCBKT = 2 THEN CFTLVL ELSE 0END) AS TL_Packaging

    ,SUM(CASE WHEN CFCBKT = 3 THEN CFTLVL ELSE 0END) AS TL_Resale

    ,SUM(CASE WHEN CFCBKT = 4 THEN CFTLVL ELSE 0END) AS TL_OutsideProc

    ,SUM(CASE WHEN CFCBKT = 10 THEN CFTLVL ELSE 0 END) AS TL_DirectLabor

    ,SUM(CASE WHEN CFCBKT = 20 THEN CFTLVL ELSE 0 END) AS TL_IndirectLabor

    ,SUM(CASE WHEN CFCBKT = 21 THEN CFTLVL ELSE 0 END) AS TL_VariableOH

    ,SUM(CASE WHEN CFCBKT = 30 THEN CFTLVL ELSE 0 END) AS TL_FixedOH

    ,SUM(CASE WHEN CFCBKT = 31 THEN CFTLVL ELSE 0 END) AS TL_RMPlantWhseStorage

    ,SUM(CASE WHEN CFCBKT = 32 THEN CFTLVL ELSE 0 END) AS TL_InterplantFreight

    ,SUM(CASE WHEN CFCBKT NOT IN (0, 1, 2, 3, 4, 10, 20, 21, 30, 31, 32)

    THEN CFTLVL ELSE 0END) AS TL_AllOther

    ,SUM(CASE WHEN CFCBKT = 0 THEN CFTLVL ELSE 0END) AS TL_Total

    FROM stg.Table4 AS cmf

    JOIN stg.Table3 AS iim ON cmf.CFPROD = iim.IPROD

    JOIN stg.Table5 AS iic ON iim.ICLAS = iic.ICLAS

    JOIN stg.Table6 zmf ON cmf.CFFAC = ZMF.MFFACL

    GROUP BY cmf.CFPROD, cmf.CFCSET, iim.IDESC, iim.IITYP, iim.ICLAS, iic.ICDES

    ,iim.IUMS, cmf.CFFAC, ZMF.MFDESC, iim.IWGHT) cmf

    ON mbm1.BMWHS = cmf.CFFAC

    AND COALESCE(mbm6.BCHLD,mbm5.BCHLD,mbm4.BCHLD,mbm3.BCHLD,mbm2.BCHLD,mbm1.BCHLD) = cmf.CFPROD

    AND CFCSET = 3

    The bottom two joins are the issue I've found. If you comment out either of them at once the query will run in seconds. If they are both in there then it runs for hours. I've updated stats and the indexes are fairly good.

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

  • Can you please check in another SQL window with Actual Execution Plan on, the time taken and plan for running a normal select * on the bottom 2 tables, join on each bottom table with the top table. Also find out whether the I/O or the CPU is the culprit for these from the Profiler. If the storage environment has got changed while migrating it may cause some issue. Also you can try RECOMPILE Hint in case it is using some invalid plan. I am not sure whether the backup and Restore can cause the Inner Hash Join to be used by optimizer, may be cleaning the cache may help a bit.

    Your query looks complex though.:)

  • Are you able to post Execution Plans?

  • Here is the current execution plan

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

  • I've seen it happen where indexes / exec plans got "stale" after backup/restore; SQL totally ignoring existance of indexes, even though they're there and "should" be used.

    I'd start with updating statistics, if this doesn't help ( honestly don't remember if it was enough in my case) - rebuild indexes will definitely do the trick.

  • rafael lenartowicz (3/11/2012)


    I've seen it happen where indexes / exec plans got "stale" after backup/restore; SQL totally ignoring existance of indexes, even though they're there and "should" be used.

    I'd start with updating statistics, if this doesn't help ( honestly don't remember if it was enough in my case) - rebuild indexes will definitely do the trick.

    I did update stats and rebuild the indexes...didn't help. Also I tried running it again last night and it filled up the tempdb.

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

  • First glance at the execution plan made me jumped on my chair. The estimated rows to be returned are 1,527,070,000,000 and the estimated rows size is 195,147,877 MB :w00t: Is it what you really want????? (You mentioned that statistics are uptodate) Are you sure the data is the same? What is the execution plan on the other server?

  • SQLJocky (3/8/2012)


    I have a query that ran in 3 seconds on a 2008 server, then we moved the database and everything to a different 2008R2 server (backup, restore) and now the query takes 2 hours and 15 minutes to run. I'm trying to figure out what could cause the change in performance. The servers have similar configurations (4 processors, 32GB RAM, etc.). The only difference is that the original server was physical and the new server is VM. The new VM server has been running quite a while and has no other issues. Any suggestions on what to look for or what could be happening are very welcome.

    1) You moved from SQL 2008 to SQL 2008R2. There are query optimizer differences that could be responsible for this regression. Is your 2008R2 server patched up to latest SP/CU?

    2) That is a VERY complex query. It is not at all surprising that the optimizer didn't get things right. Best would be to refactor it to have one or more intermediate sets to keep the optimizer from "going out to lunch".

    3) I see you updated stats with full scan and it didn't help. Bummer.

    4) did you do a wait stats and IO stall analysis during this execution? VMs can be HORRIBLE performers when improperly provisioned/setup - which is the vast majority of the time at clients I visit. :w00t:

    I think the complexity of this query goes well beyond what you can reasonably expect to address on a forum staffed by people who volunteer their time. Get a performance tuning professional on board to help you is my recommendation.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • SQLJocky,

    I know exactly what is wrong with your query. We had the similar issue before.

    If you look at the plan you'll see a nested loops without a JOIN predicate. In this particular case it doesn't mean that some JOINs are missing. What really happens is the optimizer decides to do CROSS JOIN instead of JOIN with conditions and apply a filter only a few operations later. That's why performance is so bad and you see these billions of estimated rows.

    Why is it doing so? If we take a closer look at Filter operator in the plan we'll see this horrible predicate:

    CONVERT_IMPLICIT(nvarchar(35),

    CASE WHEN [Expr1040] IS NOT NULL THEN [Expr1040] ELSE

    CASE WHEN [Expr1032] IS NOT NULL THEN [Expr1032] ELSE

    CASE WHEN [Expr1024] IS NOT NULL THEN [Expr1024] ELSE

    CASE WHEN [Expr1016] IS NOT NULL THEN [Expr1016] ELSE

    CASE WHEN [Expr1008] IS NOT NULL THEN [Expr1008] ELSE

    [BSNA_STG].[stg].[V83BPCSF_MBM].[BCHLD] as [mbm].[BCHLD]

    END

    END

    END

    END

    END,0)

    =

    [BSNA_STG].[stg].[V83BPCSF_CMF].[CFPROD] as [cmf].[CFPROD]

    AND

    CASE WHEN [Expr1040] IS NOT NULL THEN [Expr1040] ELSE

    CASE WHEN [Expr1032] IS NOT NULL THEN [Expr1032] ELSE

    CASE WHEN [Expr1024] IS NOT NULL THEN [Expr1024] ELSE

    CASE WHEN [Expr1016] IS NOT NULL THEN [Expr1016] ELSE

    CASE WHEN [Expr1008] IS NOT NULL THEN [Expr1008] ELSE

    [BSNA_STG].[stg].[V83BPCSF_MBM].[BCHLD] as [mbm].[BCHLD]

    END

    END

    END

    END

    END

    =

    [BSNA_STG].[stg].[V83BPCSF_IIM].[IPROD] as [iimx].[IPROD]

    it was generated based on the following join conditions from the query:

    COALESCE(mbm6.BCHLD,mbm5.BCHLD,mbm4.BCHLD,mbm3.BCHLD,mbm2.BCHLD,mbm1.BCHLD) = iimx.IPROD

    COALESCE(mbm6.BCHLD,mbm5.BCHLD,mbm4.BCHLD,mbm3.BCHLD,mbm2.BCHLD,mbm1.BCHLD) = cmf.CFPROD

    The optimizer simply cannot do a good estimation and proper join based on this stuff. So my suggestion would be to get rid of this join by re-writing the query/adding temp tables etc.

    Conclusion: never put functions on the columns which are used in JOIN conditions. Sometimes it even worse than functions on the columns in a WHERE clause.


    Alex Suprun

  • I figured out that it was actually being caused by indexing.

    On the old server, there were several of the tables that were heaps with non-clustered indexes. I went through and optimized for the execution plan, not realizing that it was actually detrimental to SQL Optimizer. After adding clustered indexes and optimizing, it was far far worse and almost seems like the optimizer was doing a cartesian join.

    I dropped the tables and recreated from the old server with the same indexes and it runs perfectly fine. I would love it if someone could point me to why that happened so I can be educated.

    Thansk!

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

  • How do you expect someone to explain it to you if we don't know what exactly have you changed and how the execution plan looks now? In general, by changing indexes you are changing the costs and since the optimizer is cost-based therefore the plan is changing. I have pointed out already what is the root cause of your issue but looks like you are happy with your temporary solution. Why temporary? Because it may be just the matter of time when the query will start to perform poorly again.


    Alex Suprun

  • Alexander Suprun (3/13/2012)


    How do you expect someone to explain it to you if we don't know what exactly have you changed and how the execution plan looks now? In general, by changing indexes you are changing the costs and since the optimizer is cost-based therefore the plan is changing. I have pointed out already what is the root cause of your issue but looks like you are happy with your temporary solution. Why temporary? Because it may be just the matter of time when the query will start to perform poorly again.

    Well my request for explanation is more about why it is sometimes detrimental to add a clustered index to a table. I was in a Microsoft Optimization and Tuning class where we briefly touched on it but didn't go into any detail. Fortunately, it runs fine now that I realized what caused it to break. Unfortunately, it is very difficult to determine how to rewrite this inherited query and still keep the intended logic and result set.

    Owner & Principal SQL Server Consultant
    Im Your DBA, Inc.
    https://www.imyourdba.com/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply