January 23, 2003 at 4:51 pm
I'm currently running a query that is updating one field on each record of a table. The table has 110 million records. The update has been running for 46 hours and I am trying to determine its progress. I have used SQL Profiler to check the status of physical disk reads/sec and writes/sec but I don't really know how to relate that to how much time may be remaining.
The database and log files are on separate RAID arrays. I can see that the log file is growing so I assume the query will complete sometime. There are about 8 indexes on the table and I did not drop them before the update started. The machine has dual 2G Xeon processors and 4GB of Ram. I wouldn't have thought it would take this long but that is another issue i guess.
Is there any way of measuring progress of this query?
Thanks,
David
January 23, 2003 at 7:21 pm
46 hours seems long, could have stalled. One thing tha can give you a clue is if your Transaction log is set to grow and happens to do so. But I don't think it is running or it is blocked and just not timeing out.
January 23, 2003 at 8:38 pm
The query needs to complete. Assuming it is hosed up now how can I go about estimating an appropriate amount of time for this query? This is what is running now.
begin transaction
update
detail
set
nameid = (select
nameid
from
header
where
recordid = detail.recordid)
commit transaction
Is there a better way to write this? There is a relationship between the header and detail table on recordid. Any suggestions would be greatly appreciated.
January 24, 2003 at 4:02 am
The way you wrote it may not work right and in fact may be setting the value to every value in header or to itself. We have found quarks with using subqueries and not using the table.col reference in them where it assumed the outside table instead.
Try instead
BEGIN TRANSACTION
UPDATE detail
SET nameid = header.nameid
FROM
header
INNER JOIN
detail d2
ON
header.recordid = d2.recordid
COMMIT TRANSACTION
Note sure how long the other could take but I would only think a few hours at the most unless there is an index on recordid in the tables which will speed it up.
Edited by - antares686 on 01/24/2003 04:04:29 AM
January 30, 2003 at 10:55 am
Thanks for the suggestion. I tried the new update on a small range (using a where clause) of data to get execution times so I can estimate how long this will take.
After 3 runs I am getting an average of 1 updated record every 0.00324 seconds. To complete the 107 million record update that suggest 96 hours. That seems long.
I feel like the join must be what is slowing things down. There is a foreign key defined on the detail table with reference to the header table. However when I created the relationship I did not choose the check existing data on creation option. Would that have made a difference?
Thanks for all the help.
February 3, 2003 at 3:01 am
David,
I have found that with long running queries it is often better to break the query up into a number of smaller queries.
The main reason for this is that if, for whatever reason, your query stops before it completes the server will then rollback the updates which can in itself take a long time. The server will have to redo all the work that was rolled back when you rerun the query.
If you split up the query into blocks of 1 million rows, then if you have to stop the query, you would only lose the updates for the current set of 1 million rows. All the other updates would have been commited to the database. For example:
declare @count int
set @count = 0
while @count <= 110 begin
/* Display counter info */
print convert(varchar,datetime())
print convert(varchar,@count)
begin transaction
/* Run update sql with where clause */
UPDATE detail
SET nameid = header.nameid
FROM header INNER JOIN detail d2
ON header.recordid = d2.recordid
WHERE detail.recordid between (@count * 1000000) AND (@count + 999999)
/* NB (@count + 999999) should be (@count plus 999999) but the plus sign doesn't seem to display */
commit transaction
/* Increment Counter */
set @count = @count + 1
end
I haven't checked that this all works in Query Analyser but the principle is that it would update 1 million rows at a time. The first time round the loop it would update records 0 to 999,999; the second time round it would update records 1,000,000 to 1,999,999 and so on.
This has the advantage that it will print out a date and time and counter information each time around so you can easily see whether the query is still running. Also, if you need to restart the query then you can change the start value of @count.
You could also try issuing a table lock by replacing the UPDATE detail with UPDATE detail with (tablock). This might prevent other users from accessing the table but SQL Server might spend less time managing the row locks and more time processing the update.
February 3, 2003 at 4:40 am
Another way to do that is to issue a set rowcount 1000000 or whatever value. Definitely makes sense to break up into chunks, you could even run multiple spids doing the update as long as you segmented the where clause to avoid overlap.
Andy
February 3, 2003 at 12:41 pm
Just One more thing. You are using a One-Statement Update, so The BEGIN TRANSACTION and COMMIT TRANSACTION Statements are NOT neccessary, in fact they are NOT recomended for a single statement transactions
Noel
* Noel
February 4, 2003 at 6:31 am
I Frequent use a "Batch template" with the set rowcount statement to run updates like these. They are definately the way to go. With a bit of extra code you code easily print out how many rows have been processed and estimate the time till completion.
--========================================================
--Run Query in Batches of certain row size
--========================================================
Declare @BatchSize int
,@RowsAffected int
Select @Batchsize = 100000
,@RowsAffected = @Batchsize
Set Rowcount @BatchSize
While @rowsAffected = @BatchSize Begin
--====== Add Code
Select @RowsAffected = @@Rowcount
End
Set Rowcount 0
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply