December 2, 2008 at 9:26 pm
The following two statements are taking 1:13 hours to run. How can I speed it up?
UPDATE dbo.CLAIMS SET admission_date = NULL WHERE admission_date IS NOT NULL;
GO
UPDATE dbo.CLAIMS SET discharge_date = NULL WHERE discharge_date IS NOT NULL;
GO
The CLAIMS table has 33,290,917 records.
Any help is much appreciated.
-Doug
December 2, 2008 at 10:15 pm
Doug (12/2/2008)
The following two statements are taking 1:13 hours to run. How can I speed it up?UPDATE dbo.CLAIMS SET admission_date = NULL WHERE admission_date IS NOT NULL;
GO
UPDATE dbo.CLAIMS SET discharge_date = NULL WHERE discharge_date IS NOT NULL;
GO
The CLAIMS table has 33,290,917 records.
Any help is much appreciated.
-Doug
Break it into smaller batches instead of of two huge updates.
declare @BatchSize int;
set @BatchSize = 10000;
while exists(select 1 from dbo.Claims where admission_date is not null
begin
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
admission_date = NULL
WHERE
admission_date IS NOT NULL;
-- I'd put a transaction log backup here
end
while exists(select 1 from dbo.Claims where discharge_date is not null
begin
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
discharge_date = NULL
WHERE
discharge_date IS NOT NULL;
-- I'd put a transaction log backup here
end
The above is a SQL Server 2005 solution. If you are actually using SQL Server 2000, there will need to be some changes.
December 2, 2008 at 10:36 pm
To add to what Lynn stated.... there are two reasons you need to break up such a large update... the first is the LOG file. Even with the SIMPLE recovery mode, the LOG file will grow to hold everything that happens in a single Update so it can be rolled back if interrupted.
The second reason is, every system has a "tipping point". On most systems, such a simple Update will easily handle a million rows in a minute or less. For two million rows, just double the time for two minutes or less. It may even do 3 or 4 million rows in the expected 3 or 4 minutes. But, then the system loses it's mind... it may take an hour to do just 5 million rows and it may take 24 or more hours to do 32 million rows.
The "tipping point" in each system varies depending on available memory, IO, LOF file size, etc, etc, and I know of no one who has a simple formula to predict that point. One query may easily do an Update on 4 million rows in 4 minutes, while another may not make it past 2 million rows. You can certainly test for the "tipping point" for each query, but it's usually easier to have a template to divide the work into million row updates that continue until done.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 2, 2008 at 10:38 pm
P.S. You may want to stop that single Update you're running right now. It may take several hours to rollback or it might rollback quicker. But even such a simple update is going to take a huge amount of time on 32 million rows because of the "tipping point". It's worth the stop and then restart the Update in smaller, million row chunks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 2, 2008 at 10:42 pm
One more thing... you're updating the same columns that you have in your criteria. Sometimes, and it's again unpredictable, that will cause a recompile for every row updated. You should probably capture the PK column info in a separate working table for rows that meet your criteria and the use that information as a join to do the update. If you include an IDENTITY column in the working table, that'll also give you something to control the million row cycles with, as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2008 at 2:06 pm
Thanks Lynn for your input, but it's not helping me much. I ran the script you posted and the 1st statement had 213 batches and took 49 minutes. I let it keep running until it reached 59 minutes and went ahead and killed it. No one else is using the server so most all its resources were available. Perhaps I should go smaller than 10000.
-Doug
December 3, 2008 at 2:13 pm
Lynn, Jeff had mentioned the log file and you had a comment to back it up between statements; however, I am using simple recovery and ss2005 doesn't let me run the BACKUP LOG statement for a database using the simple recovery mode but requires BACKUP DATABASE instead which I did not do in the script you provided.
If the log file is the reason it still runs a long time, what can I do about that?
I will try Jeff's method of creating a pk tmp table.
-Doug
December 3, 2008 at 2:13 pm
You should take transaction log backups while this is running to avoid log autogrows which could slow you down even more.
* Noel
December 3, 2008 at 3:20 pm
"You should take transaction log backups while this is running to avoid log autogrows which could slow you down even more. " * Noel
This doesn't seem possible with database in simple recovery mode.
December 3, 2008 at 3:33 pm
If your database is using the simple recovery model, t-log backups aren't needed. I put that comment in for two reasons; one, you didn't mention the recovery model the database is using; and two, I didn't know the name of the database and didn't fell like writing out the code needed for the t-log backup.
Even though you are using simple recovery model, you still need to batch the updates.
December 3, 2008 at 3:39 pm
Doug (12/3/2008)
Thanks Lynn for your input, but it's not helping me much. I ran the script you posted and the 1st statement had 213 batches and took 49 minutes. I let it keep running until it reached 59 minutes and went ahead and killed it. No one else is using the server so most all its resources were available. Perhaps I should go smaller than 10000.-Doug
Not quite sure what you are telling me here. Were there 213 batches to process, and after 49 minutes it still hadn't finished or in 49 minutes it had done 213 batches? If the later, it had already processed 21,300,000 records, which means you would have been close to finishing. Even though you killed it, you should know that all that would have rolled back when you killed it was the last batch of 10000 records you were updating. All previous batches will have been completed and written to disk.
December 3, 2008 at 3:52 pm
[font="Arial Black"]IGNORE THIS POST, but I still want people to see it.[/font]
My original code:
declare @BatchSize int;
set @BatchSize = 10000;
while exists(select 1 from dbo.Claims where admission_date is not null
begin
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
admission_date = NULL
WHERE
admission_date IS NOT NULL;
-- I'd put a transaction log backup here
end
while exists(select 1 from dbo.Claims where discharge_date is not null
begin
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
discharge_date = NULL
WHERE
discharge_date IS NOT NULL;
-- I'd put a transaction log backup here
end
As I look at this code (and yours) something jumps out at me. In this table (dbo.Claims) set admission_date and discharge_date to null where they aren't null. This implies there are records already in the table where these fields are null. So, let's just do it in one pass:
declare @BatchSize int;
set @BatchSize = 10000;
while exists(select 1 from dbo.Claims where admission_date is not null
begin
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
admission_date = NULL,
discharge_date = NULL
end
Either way, when this code finishes, all records in the table should have admission_date and discharge_date equal to null.
The new code eliminates the test for null, and will use a table/clustered index scan. Hopefully you have a clustered index declared on the table.
Doing it this way, you may actually be able to increase your batch size to 100000 or even as Jeff suggested 1000000.
December 3, 2008 at 3:56 pm
Never mind my post above, it won't work. I am having a brain dead day here at work and it is seeping over to here. You'd still need to test for not nulls.
December 3, 2008 at 4:02 pm
Lynn Pettis (12/3/2008)
Never mind my post above, it won't work. I am having a brain dead day here at work and it is seeping over to here. You'd still need to test for not nulls.
Actually - I don't think you're brain dead at all. You're onto something: the end result from this query will be - "all rows will have NULLs in the admission_date and discharge_date columns"
Given that - I would look at doing the batching, but do it by forcing a "walk the clustered index" scenario, so that you're a. doing just one pass through the table, and b. just touching each data page one time.
You can always just tell it to ignore any rows where BOTH columns are already null (if there are a lot of those). If there wouldn't be a lot of those rows, then just batch them up by clustered index key, and skip checking for null at all (that way you get around the issue Jeff was mentioning).
Doug - what's the clustered index scenario? What's the column name/data type involved?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 3, 2008 at 4:12 pm
Lynn Pettis (12/3/2008)
Doug (12/3/2008)
Thanks Lynn for your input, but it's not helping me much. I ran the script you posted and the 1st statement had 213 batches and took 49 minutes. I let it keep running until it reached 59 minutes and went ahead and killed it. No one else is using the server so most all its resources were available. Perhaps I should go smaller than 10000.-Doug
Not quite sure what you are telling me here. Were there 213 batches to process, and after 49 minutes it still hadn't finished or in 49 minutes it had done 213 batches? If the later, it had already processed 21,300,000 records, which means you would have been close to finishing. Even though you killed it, you should know that all that would have rolled back when you killed it was the last batch of 10000 records you were updating. All previous batches will have been completed and written to disk.
213 batches completed the first statement in your script for entire table:
UPDATE TOP (@BatchSize) dbo.CLAIMS SET
admission_date = NULL
WHERE
admission_date IS NOT NULL;
You are right that it processed all 21,300,000 records that had NOT NULL admission date (all in 49 mins). It started the 2nd statement for the discharge_date but I stopped it after 10 mins since it was taking too long. If I had let it continue, it most likely would have been another 49 mins since it had to update the same number of records for the NOT NULL discharge dates. I added a BEGIN TRANSACTION in front of your script so I could ROLLBACK afterwards, which I did.
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply