March 25, 2008 at 9:22 pm
March 25, 2008 at 10:57 pm
Something less than a million rows at a time in order by the clustered primary key with a 10 second delay between million row updates.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 27, 2008 at 3:33 am
Any solution should be based around Jeff's suggestion. You should do the update as a series of transactions (explicit or implicit) so you are not locking too many rows at once.
If your DB is in Full Recovery you need to expect a lot of logging. This may mean running tran log backups more frequently (maybe after every million or 10 million rows updated). Another way is to ensure you have enough disk space to cope with log growth - if your log disk fills up all update work on your DB will be suspended until you resolve the log use issue.
Finally, you could put the DB into Simple mode, do your updates, then put back to Full recovery and take a full backup. If you perfer this approach make sure you have appropriate authorisation from your management.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
March 27, 2008 at 8:27 am
March 27, 2008 at 6:36 pm
Rather than testing your code on 300 million rows, here's some code that generates a million rows of test data. This particular code also has what I call a "Delete Crawler" that operates very much like I suggested for your updates. The "meat" of the proc that does the deletes is near the very end... change that from a DELETE to an UPDATE and see what happens. If you do it in a test database, you should be able to play until it does just exactly what you want on the million rows (doesn't take long to make those and you can customize the column names, datatypes, and random data with no problem). Once you have it running on a million rows, bump it up to 5 or 10 million rows and see how it does.
As usual for me, most of the information about how the code runs and what it does is in the form of embedded comments in the code... lemme know how it works out for you...
--===== If the test table exists, drop it
IF OBJECT_ID('dbo.JBMTestDetail','U') IS NOT NULL
DROP TABLE dbo.JBMTestDetail
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "ConnID" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)
-- Column "Key1" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "Time_Stamp" has a range of >=01/01/2005 and <01/01/2015 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
SELECT TOP 1000000
ConnID = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(30)),''), --(10 rows per connection)
Key1 = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(20)),''), --just to test index with
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
Time_Stamp = ISNULL(CAST(RAND(CHECKSUM(NEWID()))*3652.0+38351.0 AS DATETIME),0),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTestDetail
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== Create indexes similar to Troy's
CREATE CLUSTERED INDEX IXC_JBMTestDetail_Time_Stamp_ConnID ON dbo.JBMTestDetail (Time_Stamp,ConnID)
CREATE NONCLUSTERED INDEX IX_JBMTestDetail_ConnID_Key1 ON dbo.JBMTestDetail (ConnID,Key1)
GO
--===== Setup to measure performance...
SET STATISTICS TIME ON
--===== Define the cutoff date with a time of "midnight" or, if you will,
-- define the cutoff date with no time so we only delete whole days.
DECLARE @CutoffDate DATETIME
SELECT @CutoffDate = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),-94)
--===== Limit all further queries, including deletes, to 25,000 rows
-- (about 1 second worth of deletes, like I said before)
SET ROWCOUNT 25000
--===== See if any rows qualify for deletion. If even just one exists,
-- then there's work to do and @@ROWCOUNT will be > 0.
-- DO NOT PUT ANY CODE BETWEEN THIS SELECT AND THE WHILE LOOP OR
-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP
SELECT TOP 1 1 FROM dbo.JBMTestDetail WHERE Time_Stamp < @CutoffDate
--===== If the rowcount from the above is greater than 0,
-- then delete 25,000 rows at a time until there's nothing
-- left to delete
WHILE @@ROWCOUNT > 0
BEGIN
--===== Just a "marker" to separate the loop in the output
PRINT REPLICATE('=',78)
--===== This delay gives other processes breathing room
WAITFOR DELAY '00:00:10'
--===== Do the delete. Will be limited by the SET ROWCOUNT above.
-- DO NOT PUT ANY CODE BETWEEN THIS DELETE AND THE "END" OR
-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP.
DELETE dbo.JBMTestDetail WITH (TABLOCKX)
WHERE Time_Stamp < @CutoffDate
END
--===== Restore the ability to process more than 25,000 rows
SET ROWCOUNT 0
SELECT 1000000 - COUNT(*) FROM jbmtestdetail
--Jeff Moden
Change is inevitable... Change for the better is not.
March 27, 2008 at 6:40 pm
March 28, 2008 at 6:01 pm
You bet... and thanks for the great feedback!
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply