June 9, 2014 at 12:06 am
Hi all,
I have attached a script and its execution plan for your review.
We are working for a long time on optimizing this script, and latest modification I did was splitting script into small blocks and using UNION ALL.
This helped a lot but still it takes some 45 mins on prod environment. I'm not able to find anything more in this script to optimize.
Kindly review and suggest something in this script. Any help will be highly appreciated.
Regards.
June 9, 2014 at 3:22 am
T.Ashish (6/9/2014)
Hi all,I have attached a script and its execution plan for your review.
We are working for a long time on optimizing this script, and latest modification I did was splitting script into small blocks and using UNION ALL.
This helped a lot but still it takes some 45 mins on prod environment. I'm not able to find anything more in this script to optimize.
Kindly review and suggest something in this script. Any help will be highly appreciated.
Regards.
The optimizer is timing out attempting to find a good plan for this complicated query, see "Reason for Early Termination of Statement" in the property tab of the SELECT in the plan. What this means is that the plan isn't guaranteed to be "good enough", and there's usually no point in analysing it further. A typical response to this finding would be to break the statement up into two or more separate queries which can be properly analysed, and running intermediate results into #temp tables.
Your statement consists of four queries UNIONed together using the ALL keyword (UNION ALL) except for the last query, where the ALL keyword is omitted. Are you sure you know the implications of this - even though the plan shows the cost of the deduplication as only 1% of the total?
The four queries are variations on a theme, only the WHERE clause is different. I'd recommend you isolate the core query including any columns later used as join/filter predicates and run the results of this query into a #temp table. Then either query the temp table four times, or merge the four remaining WHERE clauses - you will probably find the first approach easier to implement than the second. If you do implement this approach, you're likely to get a good enough plan for the core query - which will be worthwhile investigating for possible optimisations (note the warnings in the existing plan - "No join predicate").
Here are a few tips to help with splitting up the query:
Ensure that the core query is exactly the same for all four of the unioned queries.
Work with each query separately
Use a CTE to represent the core query.
Something like this:
;WITH CoreQuery AS (
SELECT
wck.work_item_check_id as object_id,
pd.mode_id,
p.personal_id,
[wck_worker_id] = wck.worker_id,
[p_personal_id] = p.personal_id,
[jp_cord_id] = jp.cord_id,
[wo_new_owner_id] = wo.new_owner_id,
[w_worker_id] = w.worker_id
FROM dbo.person p
INNER JOIN dbo.work_item_check wck
on wck.buyer_code = p.company_code
INNER JOIN bsr1_offline.dbo.work_item_offline pd
on pd.object_id = wck.work_item_check_id and pd.mode_id in (870, 1350) and pd.activity_id = 10179
LEFT JOIN dbo.worker w
on w.worker_id = wck.worker_id
LEFT JOIN dbo.work_order wo
on wck.work_order_id = wo.work_order_id
LEFT join dbo.job_performance jp
on wo.job_performance_id = jp.job_performance_id
WHERE p.role_id !='5'
AND (p.access_to_all_site != 'ALL' OR p.access_to_all_bu != 'ALL' OR p.access_to_all_checkpoint != 'ALL')
AND p.access_to_flag = 1
AND NOT EXISTS (
SELECT 1 FROM bsr1_offline.dbo.offline_staging boos
WHERE boos.object_id = wck.work_item_check_id and boos.mode_id = pd.mode_id and boos.personal_id = p.personal_id)
)
SELECT *
FROM CoreQuery
WHERE 1 = 1
AND
(wck_worker_id <> '' AND
(
p_personal_id IN (jp_cord_id, wo_new_owner_id)
OR EXISTS (select 'x'
from dbo.worker_checkpoint wcc
where wcc.worker_id = w_worker_id
and wcc.active_flag = 1
and wcc.supervisor_id = p_personal_id)
)
)
This part of the exercise helps to confirm that the core query has been correctly identified. It also helps you to identify columns in the WHERE clauses which will need to be in the output from the core query. Do this for each of the four queries until you've identified all of the columns required for each of the WHERE clauses. When you're done, run the results of the query into a #temp table and query it four times, something like this:
SELECT
object_id,
mode_id,
personal_id
FROM #temp
WHERE 1 = 1
AND
(wck_worker_id <> '' AND
(
p_personal_id IN (jp_cord_id, wo_new_owner_id)
OR EXISTS (select 'x'
from dbo.worker_checkpoint wcc
where wcc.worker_id = w_worker_id
and wcc.active_flag = 1
and wcc.supervisor_id = p_personal_id)
)
)
UNION ALL
-- etc.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 9, 2014 at 5:07 am
Thanks Chris.
Yes, I had already tried creating # table as you advised. This was showing lot of improvement in i/o stats as main part was executing only 1 time instead of 4 times. But It could not show any major time gain so I had to revert.
June 9, 2014 at 5:32 am
T.Ashish (6/9/2014)
Thanks Chris.Yes, I had already tried creating # table as you advised. This was showing lot of improvement in i/o stats as main part was executing only 1 time instead of 4 times. But It could not show any major time gain so I had to revert.
I'm not sure what you're saying here. You have to split up or otherwise simplify the original query, one way or another, because the optimizer is timing out.
The simplest way would be to separate the four unioned queries and run their results into one or more #temp tables. That will give you four fairly simple plans to analyse.
The method most likely to improve performance is to identify the common part and run it only once, as I described earlier. Even if you get no immediate performance gain, you are at least in a position where you can tweak using clues from the plans - which you can't do with any confidence with the query as it is.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 9, 2014 at 2:32 pm
Take one of the slowest queries and add top 1000 to it and run it without the unions and start picking away the pieces to which is the bottleneck. Change your "in" clauses to a (value1 or value2), always avoid "in" clauses. Next make sure that the cols in the where clauses are indexed properly
June 9, 2014 at 2:54 pm
wall str33t (6/9/2014)
Change your "in" clauses to a (value1 or value2), always avoid "in" clauses.
Um, why?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 9, 2014 at 2:58 pm
wall str33t (6/9/2014)
...always avoid "in" clauses
Since they're frequently faster than regular Inner Joins, why would I want to do that?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 9, 2014 at 3:07 pm
In clauses are generally slower because people write them instead of joins, so they don't scale well, but in your example they are prob faster than separate or statements.
June 10, 2014 at 4:15 am
wall str33t (6/9/2014)
Take one of the slowest queries and add top 1000 to it ...
Using TOP to choke the result set can dramatically change the execution plan whether or not the query has an ORDER BY clause, in which case you would end up, in effect, tuning a different query to the one you're interested in.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 10, 2014 at 4:45 am
wall str33t (6/9/2014)
In clauses are generally slower because people write them instead of joins,
Are you sure about that? Got some reproducible examples which show that 'ins are generally slower than joins'?
Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 10, 2014 at 10:14 am
You might want to give the query below a try, although it will output a given row only once, even if it would have matched multiple conditions in the original query.
select
wck.work_item_check_id as object_id, pd.mode_id, p.personal_id
from
dbo.person p
inner join dbo.work_item_check wck on wck.buyer_code = p.company_code
inner join bsr1_offline.dbo.work_item_offline pd on pd.object_id = wck.work_item_check_id and pd.mode_id in (870, 1350) and pd.activity_id = 10179
LEFT JOIN dbo.worker w on w.worker_id = wck.worker_id
LEFT JOIN dbo.work_order wo on wck.work_order_id = wo.work_order_id
LEFT join dbo.job_performance jp on wo.job_performance_id = jp.job_performance_id
where
p.role_id !='5' and (p.access_to_all_site != 'ALL' OR p.access_to_all_bu != 'ALL' OR p.access_to_all_checkpoint != 'ALL') and p.access_to_flag = 1
and not exists (select 1 from bsr1_offline.dbo.offline_staging boos where boos.object_id = wck.work_item_check_id and boos.mode_id = pd.mode_id and boos.personal_id = p.personal_id)
and
wck.worker_id != '' and
1 = case
when
(
p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id
or exists (select 'x'
from dbo.worker_checkpoint wcc
where wcc.worker_id = w.worker_id and wcc.active_flag = 1 and wcc.supervisor_id = p.personal_id)
)
then 1
when
((
(p.access_to_all_site = 'ALL'
OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id))
and (p.access_to_all_bu = 'ALL'
OR exists ( select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id))
and exists ( select 'x' from dbo.worker_checkpoint wcc , dbo.person_checkpoint pcc
where wcc.worker_id = w.worker_id and pcc.personal_id = p.personal_id and wcc.checkpoint_id = pcc.checkpoint_id)))
then 1
when
(p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id
or exists (select 'x' from dbo.worker_checkpoint wcc
where wcc.worker_id = w.worker_id and wcc.active_flag = 1 and wcc.supervisor_id = p.personal_id))
or
((p.access_to_all_site = 'ALL'
OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id))
and (p.access_to_all_bu = 'ALL'
OR exists ( select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id))
and (p.access_to_all_checkpoint = 'ALL'))
then 1
when
((p.personal_id = jp.cord_id or p.personal_id = wo.new_owner_id or p.personal_id = jp.owner_id or p.personal_id = wo.creator_id)
or
((p.access_to_all_site = 'ALL'
OR exists (select 'x' from dbo.person_site ps where ps.site_id = wo.site_id and ps.personal_id = p.personal_id)
)
and
(p.access_to_all_bu = 'ALL'
OR exists (select 'x' from dbo.person_bu pb where pb.bu_id = wo.bu_id and pb.personal_id = p.personal_id)
)
and
((p.access_to_all_checkpoint = 'ALL'
OR exists ( select 'x' from dbo.work_order_checkpoint wocc , dbo.person_checkpoint pcc
where wocc.work_order_id = wo.work_order_id and pcc.personal_id = p.personal_id and wocc.checkpoint_id = pcc.checkpoint_id)
or exists ( select 'x' from dbo.worker_checkpoint wcc , dbo.person_checkpoint pcc
where wcc.worker_id = w.worker_id and pcc.personal_id = p.personal_id and wcc.checkpoint_id = pcc.checkpoint_id)))))
then 1
else 0 end
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 10, 2014 at 10:17 am
GilaMonster (6/10/2014)
wall str33t (6/9/2014)
In clauses are generally slower because people write them instead of joins,Are you sure about that? Got some reproducible examples which show that 'ins are generally slower than joins'?
Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?
Second that... Let's see some proof.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 10, 2014 at 10:26 am
Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?
Yes, this is what I was referring to when I mentioned joins
June 10, 2014 at 10:30 am
Keep in mind that the slow query may extend passed your t-sql here, it could be a result of poorly placed indexes, not enough indexes, fragmentation, the list goes on and on about how the server is also managed
June 10, 2014 at 1:21 pm
wall str33t (6/10/2014)
Also, are you distinguishing between IN (val1, val2, val3, ...) and IN (SELECT ...), which are processed completely differently?
Yes, this is what I was referring to when I mentioned joins
If you were referring to the IN (SELECT ...) form, then why did you say to change IN (<comma-delimited list>), which is processed in a completely different way? Also, do you have that proof of IN being slower than a join?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply