What would you do?Update situation....

Question

What would you do?Update situation....

Angelindiego

SSCertifiable

Points: 6048
More actions
March 25, 2008 at 9:22 pm

#124609

Here is my question:
You have a high transaction database with a 24/7 availability requirement; it has a table that contains 300 million records, and one of the columns [IsActive] needs to be UPDATED from value 0 to 1. How would you do it?
:w00t:
Thank you!!,
Angelindiego

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004393 More actions · Answer 1

Something less than a million rows at a time in order by the clustered primary key with a 10 second delay between million row updates.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

EdVassie SSC Guru Points: 60476 More actions · Answer 2

Any solution should be based around Jeff's suggestion. You should do the update as a series of transactions (explicit or implicit) so you are not locking too many rows at once.

If your DB is in Full Recovery you need to expect a lot of logging. This may mean running tran log backups more frequently (maybe after every million or 10 million rows updated). Another way is to ensure you have enough disk space to cope with log growth - if your log disk fills up all update work on your DB will be suspended until you resolve the log use issue.

Finally, you could put the DB into Simple mode, do your updates, then put back to Full recovery and take a full backup. If you perfer this approach make sure you have appropriate authorisation from your management.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Angelindiego SSCertifiable Points: 6048 More actions · Answer 3

thank you both for such wonderful answers to my question!! I appreciate your help!!!

Thank you!!,

Angelindiego

Jeff Moden SSC Guru Points: 1004393 More actions · Answer 4

Rather than testing your code on 300 million rows, here's some code that generates a million rows of test data. This particular code also has what I call a "Delete Crawler" that operates very much like I suggested for your updates. The "meat" of the proc that does the deletes is near the very end... change that from a DELETE to an UPDATE and see what happens. If you do it in a test database, you should be able to play until it does just exactly what you want on the million rows (doesn't take long to make those and you can customize the column names, datatypes, and random data with no problem). Once you have it running on a million rows, bump it up to 5 or 10 million rows and see how it does.

As usual for me, most of the information about how the code runs and what it does is in the form of embedded comments in the code... lemme know how it works out for you...

--===== If the test table exists, drop it

IF OBJECT_ID('dbo.JBMTestDetail','U') IS NOT NULL

DROP TABLE dbo.JBMTestDetail

GO

--===== Create and populate a 1,000,000 row test table.

-- Column "ConnID" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)

-- Column "Key1" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "Time_Stamp" has a range of >=01/01/2005 and <01/01/2015 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

ConnID = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(30)),''), --(10 rows per connection)

Key1 = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(20)),''), --just to test index with

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

Time_Stamp = ISNULL(CAST(RAND(CHECKSUM(NEWID()))*3652.0+38351.0 AS DATETIME),0),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTestDetail

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== Create indexes similar to Troy's

CREATE CLUSTERED INDEX IXC_JBMTestDetail_Time_Stamp_ConnID ON dbo.JBMTestDetail (Time_Stamp,ConnID)

CREATE NONCLUSTERED INDEX IX_JBMTestDetail_ConnID_Key1 ON dbo.JBMTestDetail (ConnID,Key1)

GO

--===== Setup to measure performance...

SET STATISTICS TIME ON

--===== Define the cutoff date with a time of "midnight" or, if you will,

-- define the cutoff date with no time so we only delete whole days.

DECLARE @CutoffDate DATETIME

SELECT @CutoffDate = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),-94)

--===== Limit all further queries, including deletes, to 25,000 rows

-- (about 1 second worth of deletes, like I said before)

SET ROWCOUNT 25000

--===== See if any rows qualify for deletion. If even just one exists,

-- then there's work to do and @@ROWCOUNT will be > 0.

-- DO NOT PUT ANY CODE BETWEEN THIS SELECT AND THE WHILE LOOP OR

-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP

SELECT TOP 1 1 FROM dbo.JBMTestDetail WHERE Time_Stamp < @CutoffDate

--===== If the rowcount from the above is greater than 0,

-- then delete 25,000 rows at a time until there's nothing

-- left to delete

WHILE @@ROWCOUNT > 0

BEGIN

--===== Just a "marker" to separate the loop in the output

PRINT REPLICATE('=',78)

--===== This delay gives other processes breathing room

WAITFOR DELAY '00:00:10'

--===== Do the delete. Will be limited by the SET ROWCOUNT above.

-- DO NOT PUT ANY CODE BETWEEN THIS DELETE AND THE "END" OR

-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP.

DELETE dbo.JBMTestDetail WITH (TABLOCKX)

WHERE Time_Stamp < @CutoffDate

END

--===== Restore the ability to process more than 25,000 rows

SET ROWCOUNT 0

SELECT 1000000 - COUNT(*) FROM jbmtestdetail

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Angelindiego SSCertifiable Points: 6048 More actions · Answer 5

Wow!!! That is awesome! Thank you for this and I will hopefully get to try this over the weekend! Thank you so much!

Thank you!!,

Angelindiego

Jeff Moden SSC Guru Points: 1004393 More actions · Answer 6

You bet... and thanks for the great feedback!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)