Issue with a slow Query

  • Hi all,

    I have attached a script and its execution plan for your review.

    We are working for a long time on optimizing this script, and latest modification I did was splitting script into small blocks and using UNION ALL.

    This helped a lot but still it takes some 45 mins on prod environment. I'm not able to find anything more in this script to optimize.

    Kindly review and suggest something in this script. Any help will be highly appreciated.

    Regards.

  • T.Ashish (6/9/2014)


    Hi all,

    I have attached a script and its execution plan for your review.

    We are working for a long time on optimizing this script, and latest modification I did was splitting script into small blocks and using UNION ALL.

    This helped a lot but still it takes some 45 mins on prod environment. I'm not able to find anything more in this script to optimize.

    Kindly review and suggest something in this script. Any help will be highly appreciated.

    Regards.

    The optimizer is timing out attempting to find a good plan for this complicated query, see "Reason for Early Termination of Statement" in the property tab of the SELECT in the plan. What this means is that the plan isn't guaranteed to be "good enough", and there's usually no point in analysing it further. A typical response to this finding would be to break the statement up into two or more separate queries which can be properly analysed, and running intermediate results into #temp tables.

    Your statement consists of four queries UNIONed together using the ALL keyword (UNION ALL) except for the last query, where the ALL keyword is omitted. Are you sure you know the implications of this - even though the plan shows the cost of the deduplication as only 1% of the total?

    The four queries are variations on a theme, only the WHERE clause is different. I'd recommend you isolate the core query including any columns later used as join/filter predicates and run the results of this query into a #temp table. Then either query the temp table four times, or merge the four remaining WHERE clauses - you will probably find the first approach easier to implement than the second. If you do implement this approach, you're likely to get a good enough plan for the core query - which will be worthwhile investigating for possible optimisations (note the warnings in the existing plan - "No join predicate").

    Here are a few tips to help with splitting up the query:

    Ensure that the core query is exactly the same for all four of the unioned queries.

    Work with each query separately

    Use a CTE to represent the core query.

    Something like this:

    ;WITH CoreQuery AS (

    SELECT

    wck.work_item_check_id as object_id,

    pd.mode_id,

    p.personal_id,

    [wck_worker_id] = wck.worker_id,

    [p_personal_id] = p.personal_id,

    [jp_cord_id] = jp.cord_id,

    [wo_new_owner_id] = wo.new_owner_id,

    [w_worker_id] = w.worker_id

    FROM dbo.person p

    INNER JOIN dbo.work_item_check wck

    on wck.buyer_code = p.company_code

    INNER JOIN bsr1_offline.dbo.work_item_offline pd

    on pd.object_id = wck.work_item_check_id and pd.mode_id in (870, 1350) and pd.activity_id = 10179

    LEFT JOIN dbo.worker w

    on w.worker_id = wck.worker_id

    LEFT JOIN dbo.work_order wo

    on wck.work_order_id = wo.work_order_id

    LEFT join dbo.job_performance jp

    on wo.job_performance_id = jp.job_performance_id

    WHERE p.role_id !='5'

    AND (p.access_to_all_site != 'ALL' OR p.access_to_all_bu != 'ALL' OR p.access_to_all_checkpoint != 'ALL')

    AND p.access_to_flag = 1

    AND NOT EXISTS (

    SELECT 1 FROM bsr1_offline.dbo.offline_staging boos

    WHERE boos.object_id = wck.work_item_check_id and boos.mode_id = pd.mode_id and boos.personal_id = p.personal_id)

    )

    SELECT *

    FROM CoreQuery

    WHERE 1 = 1

    AND

    (wck_worker_id <> '' AND

    (

    p_personal_id IN (jp_cord_id, wo_new_owner_id)

    OR EXISTS (select 'x'

    from dbo.worker_checkpoint wcc

    where wcc.worker_id = w_worker_id

    and wcc.active_flag = 1

    and wcc.supervisor_id = p_personal_id)

    )

    )

    This part of the exercise helps to confirm that the core query has been correctly identified. It also helps you to identify columns in the WHERE clauses which will need to be in the output from the core query. Do this for each of the four queries until you've identified all of the columns required for each of the WHERE clauses. When you're done, run the results of the query into a #temp table and query it four times, something like this:

    SELECT

    object_id,

    mode_id,

    personal_id

    FROM #temp

    WHERE 1 = 1

    AND

    (wck_worker_id <> '' AND

    (

    p_personal_id IN (jp_cord_id, wo_new_owner_id)

    OR EXISTS (select 'x'

    from dbo.worker_checkpoint wcc

    where wcc.worker_id = w_worker_id

    and wcc.active_flag = 1

    and wcc.supervisor_id = p_personal_id)

    )

    )

    UNION ALL

    -- etc.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris.

    Yes, I had already tried creating # table as you advised. This was showing lot of improvement in i/o stats as main part was executing only 1 time instead of 4 times. But It could not show any major time gain so I had to revert.

  • T.Ashish (6/9/2014)


    Thanks Chris.

    Yes, I had already tried creating # table as you advised. This was showing lot of improvement in i/o stats as main part was executing only 1 time instead of 4 times. But It could not show any major time gain so I had to revert.

    I'm not sure what you're saying here. You have to split up or otherwise simplify the original query, one way or another, because the optimizer is timing out.

    The simplest way would be to separate the four unioned queries and run their results into one or more #temp tables. That will give you four fairly simple plans to analyse.

    The method most likely to improve performance is to identify the common part and run it only once, as I described earlier. Even if you get no immediate performance gain, you are at least in a position where you can tweak using clues from the plans - which you can't do with any confidence with the query as it is.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Take one of the slowest queries and add top 1000 to it and run it without the unions and start picking away the pieces to which is the bottleneck. Change your "in" clauses to a (value1 or value2), always avoid "in" clauses. Next make sure that the cols in the where clauses are indexed properly

  • wall str33t (6/9/2014)


    Change your "in" clauses to a (value1 or value2), always avoid "in" clauses.

    Um, why?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • wall str33t (6/9/2014)


    ...always avoid "in" clauses

    Since they're frequently faster than regular Inner Joins, why would I want to do that?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In clauses are generally slower because people write them instead of joins, so they don't scale well, but in your example they are prob faster than separate or statements.

  • wall str33t (6/9/2014)


    Take one of the slowest queries and add top 1000 to it ...

    Using TOP to choke the result set can dramatically change the execution plan whether or not the query has an ORDER BY clause, in which case you would end up, in effect, tuning a different query to the one you're interested in.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • wall str33t (6/9/2014)


    In clauses are generally slower because people write them instead of joins,

    Are you sure about that? Got some reproducible examples which show that 'ins are generally slower than joins'?

    Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You might want to give the query below a try, although it will output a given row only once, even if it would have matched multiple conditions in the original query.

    select

    wck.work_item_check_id as object_id, pd.mode_id, p.personal_id

    from

    dbo.person p

    inner join dbo.work_item_check wck on wck.buyer_code = p.company_code

    inner join bsr1_offline.dbo.work_item_offline pd on pd.object_id = wck.work_item_check_id and pd.mode_id in (870, 1350) and pd.activity_id = 10179

    LEFT JOIN dbo.worker w on w.worker_id = wck.worker_id

    LEFT JOIN dbo.work_order wo on wck.work_order_id = wo.work_order_id

    LEFT join dbo.job_performance jp on wo.job_performance_id = jp.job_performance_id

    where

    p.role_id !='5' and (p.access_to_all_site != 'ALL' OR p.access_to_all_bu != 'ALL' OR p.access_to_all_checkpoint != 'ALL') and p.access_to_flag = 1

    and not exists (select 1 from bsr1_offline.dbo.offline_staging boos where boos.object_id = wck.work_item_check_id and boos.mode_id = pd.mode_id and boos.personal_id = p.personal_id)

    and

    wck.worker_id != '' and

    1 = case

    when

    (

    p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id

    or exists (select 'x'

    from dbo.worker_checkpoint wcc

    where wcc.worker_id = w.worker_id and wcc.active_flag = 1 and wcc.supervisor_id = p.personal_id)

    )

    then 1

    when

    ((

    (p.access_to_all_site = 'ALL'

    OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id))

    and (p.access_to_all_bu = 'ALL'

    OR exists ( select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id))

    and exists ( select 'x' from dbo.worker_checkpoint wcc , dbo.person_checkpoint pcc

    where wcc.worker_id = w.worker_id and pcc.personal_id = p.personal_id and wcc.checkpoint_id = pcc.checkpoint_id)))

    then 1

    when

    (p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id

    or exists (select 'x' from dbo.worker_checkpoint wcc

    where wcc.worker_id = w.worker_id and wcc.active_flag = 1 and wcc.supervisor_id = p.personal_id))

    or

    ((p.access_to_all_site = 'ALL'

    OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id))

    and (p.access_to_all_bu = 'ALL'

    OR exists ( select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id))

    and (p.access_to_all_checkpoint = 'ALL'))

    then 1

    when

    ((p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id or p.personal_id = jp.owner_id or p.personal_id = wo.creator_id)

    or

    ((p.access_to_all_site = 'ALL'

    OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id)

    )

    and

    (p.access_to_all_bu = 'ALL'

    OR exists (select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id)

    )

    and

    ((p.access_to_all_checkpoint = 'ALL'

    OR exists ( select 'x' from dbo.work_order_checkpoint wocc , dbo.person_checkpoint pcc

    where wocc.work_order_id = wo.work_order_id and pcc.personal_id = p.personal_id and wocc.checkpoint_id = pcc.checkpoint_id)

    or exists ( select 'x' from dbo.worker_checkpoint wcc , dbo.person_checkpoint pcc

    where wcc.worker_id = w.worker_id and pcc.personal_id = p.personal_id and wcc.checkpoint_id = pcc.checkpoint_id)))))

    then 1

    else 0 end

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • GilaMonster (6/10/2014)


    wall str33t (6/9/2014)


    In clauses are generally slower because people write them instead of joins,

    Are you sure about that? Got some reproducible examples which show that 'ins are generally slower than joins'?

    Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?

    Second that... Let's see some proof.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?

    Yes, this is what I was referring to when I mentioned joins

  • Keep in mind that the slow query may extend passed your t-sql here, it could be a result of poorly placed indexes, not enough indexes, fragmentation, the list goes on and on about how the server is also managed

  • wall str33t (6/10/2014)


    Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?

    Yes, this is what I was referring to when I mentioned joins

    If you were referring to the IN (SELECT ...) form, then why did you say to change IN (<comma-delimited list>), which is processed in a completely different way? Also, do you have that proof of IN being slower than a join?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply