July 7, 2011 at 12:44 pm
Thanks in advance to those with knowledge and willingness to help.
This problem has been baffling me for days.
I have a simple update like
UPDATE t
SET
col1 = t2.col,
col2 = t3.col,
col3 = t4.col,
col4 = t5.col,
col5 = t6.col,
col6 = t7.col
FROM
<table 1> t
INNER JOIN <table 2> t2 ON
...
LEFT JOIN <table 3> t3 ON
...
LEFT JOIN <table 4> t4 ON
...
LEFT JOIN <table 5> t5 ON
...
LEFT JOIN <table 6> t6 ON
...
LEFT JOIN <table 7> t7 ON
...
The table <table 1> has 5M rows.
The tables <table 2> thru <table 7> have between 1M and 30M rows. Indexes are all created and tables had "UPDATE STATISTICS" run.
The above update IS VERY SLOW (It takes hours).
Just to compare when I run the following update
UPDATE t
SET
col1 = 1,
col2 = 'xxx',
col3 = 1,
col4 = 'xxx'
col5 = NULL,
col6 = NULL
FROM
<table 1> t
INNER JOIN <table 2> t2 ON
...
LEFT JOIN <table 3> t3 ON
...
LEFT JOIN <table 4> t4 ON
...
LEFT JOIN <table 5> t5 ON
...
LEFT JOIN <table 6> t6 ON
...
LEFT JOIN <table 7> t7 ON
...
thus replacing column references with scalar values the update takes 30 seconds.
Notice: the only difference between the 2 updates are the SET statements, all the rest including FROM and JOINS are the same.
I observed that the execution plan for the 1st update uses many levels of parallelism while the 2nd update is one level.
All great ideas would be highly appreciated.
Alexander.
July 7, 2011 at 1:05 pm
It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.
But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.
The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.
There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.
July 7, 2011 at 1:18 pm
check your join criteria as well... any chance something could be invalid, like JOIN Table2 T2 t1.ID = t1.ID (when it should be t1.id = t2.id?)
Lowell
July 7, 2011 at 1:29 pm
Hi,
I think you should check your joining condition.
The columns which you are using in joining condition are all indexed or you are joining on non indexed column????
One more thing, Are you going from parent table to child table while joining?????
Thanks,
Roshan
July 7, 2011 at 1:33 pm
July 7, 2011 at 1:42 pm
Hi,
I think you should check your joining condition.
The columns which you are using in joining condition are all indexed or you are joining on non indexed column????
One more thing, Are you going from parent table to child table while joining?????
Thanks,
Roshan
July 7, 2011 at 2:15 pm
How fast does it return results when you replace the update with a select ?
Are there Nulls in the columns on which your joining the tables.
When I run COUNT (*) using the FROM condition it returns in 30 sec.
July 7, 2011 at 2:41 pm
Obviously, COUNT(*) will return u the result faster as it just counts number of rows.
When using count(*) you are just joining the tables & counting number of rows.
While in update you are selecting values from each table & depending on your condition you are updating those values.
Of-course this action will take more time than just counting number of rows.
July 7, 2011 at 3:04 pm
roshan.zanwar (7/7/2011)
Obviously, COUNT(*) will return u the result faster as it just counts number of rows.When using count(*) you are just joining the tables & counting number of rows.
While in update you are selecting values from each table & depending on your condition you are updating those values.
Of-course this action will take more time than just counting number of rows.
Please note: the 2 updates I am referring to in my original post have the same FROM and JOIN conditions. The only difference are the SET columns: the 1st one uses columns from the tables used in FROM clause, the other one (for time comparison) uses scalar values.
The 1st one runs in hours, the 2nd in minutes.
Any great ideas?
Alexander.
July 7, 2011 at 3:22 pm
With LEFT JOINs if they play no part in the query (i.e. no columns are used in the query) the QP with remove them from the query plan which would explain the runtime difference between Q1 and Q2.
I would try doing the updates a column/table at a time to see if one of the joins between the tables is causing the problem or the overall complexcity of the number joins in the update.
Martin Cairns
July 7, 2011 at 8:23 pm
Hi,
It seems to be Missing index.
Solutions:Along with your query
Use Database Engine Tuning Advisor to get more clarity of recommendations .If U feel if it is really worth, Then test in Dev server first then implement in Production.
Cheers...
July 8, 2011 at 2:25 am
July 8, 2011 at 11:02 am
To repeat what I stated earlier:
It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.
But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.
The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.
There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.
Did you try to apply the index concept I mentioned? Or can you at least provide the actual exectuion plan as an .sqlplan file?
@G.Lakshmanan
I strongly vote against using DTA. Most of the "suggestions" I got until I refused to use it where -friendly spoken- "misleading".
July 8, 2011 at 5:34 pm
LutzM (7/8/2011)
To repeat what I stated earlier:It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.
But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.
The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.
There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.
Did you try to apply the index concept I mentioned? Or can you at least provide the actual exectuion plan as an .sqlplan file?
@G.Lakshmanan
I strongly vote against using DTA. Most of the "suggestions" I got until I refused to use it where -friendly spoken- "misleading".
Please explain: what are covering indexes? How does setting of values to indexed columns (or not) affect the execution time?
Thanks, Alexander.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply