March 29, 2004 at 7:40 am
Hi!
I have a table of 200 million rows, it weights ~80GB. I do an update:
update INFO set INNsender = '00' + INNsender where datalength(rtrim(INNsender)) = 10
The process uses all six server processes (in parallel), but does not load them for more then 5%.
(Wonder how an update query can get parallel ) There is no serious load on disks as well.
There is a non-clustered index on INNsender, that is a varchar(12) data type.
The query:
select count(*) from INFO (nolock) where datalength(rtrim(INNsender)) = 10
finishers in two minutes. The former query did not finish after 20 hours.
As estimated plan shows:
Most of the time it does scanning of nonclustered index.
The process is shown as sleeping (most of the time), trully I did not see it runnable, but it does some number of updates, for about 150 000 rows an hour.
This query
select req_spid, count (*) Number from syslockinfo (nolock) group by req_spid order by 2 desc
returns:
req_spid Number
123 36345 -- my query spid
345 2
....
I fill like there are troubles with tempdb, because of too large transaction. How can I improve the performance of the query?
Thanks.
March 29, 2004 at 8:26 am
You could try using the MAXDOP 1 hint to see if parallel processing is the culprit. Also, I have read in these forums that breaking large updates into smaller batches can help. Perhaps you could use a sub-select on your length test that would return the id of your table, then use the top option to limit the number of records returned. Since your update affects the length, you could run it again and again until the update is complete.
JFW
March 29, 2004 at 8:55 am
The plan also shows a split operation, where update is divided into delete and insert. Can this be a problem if all indexes have 100% fill factor or should it go fine because delete and insert go on the same page?
March 30, 2004 at 9:10 am
Well, if you have a clustered index with a 100% fill factor on the same column you're updating you're actually forcing the Server to do a whole lot of page splitting.
Coz when you ad the two zeros in to the INNsender field and update it you're actually changing the index and information will have to be swapped around.
How ever it should not be a problem considering you mentioned having a nonclustered index, unless you have a clustered index on that field as well...
Maybe you could rebuild the index with a fill factor of 75% and retry your update...
March 30, 2004 at 10:50 am
But why updating on a clustered index would cause a lot of page splitting and updating on a non-clustered index does not? The field has a varchar data type and the data has to move to fit two additional characters.
March 30, 2004 at 5:14 pm
Not sure whether you would want to do this because of data storage considerations, but if this is a query which you execute regularly, could you add a new calculated field to the table (= datalength(rtrim(INNsender))) and then index that?
Then your update query should be able to run straight down this index.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 31, 2004 at 1:36 am
Previous poster's point is a good one, arising from the problem of searching on a derived value. Given that the data presumably has a high cardinality ( many different values), your'e likely to have to scan as far as the leaf pages to identify the rows you want to update. Schema constrainst permitting and data content, maybe you could remove the existing where clause, add an indexed tinyint column which you update at the same time as INNsender, and which you can add to the where clause of your update. That way, you can eliminate rows you have already updated. This assumes of course that the update needs to be done once only for each row.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply