Deleting 400K rows

Question

Deleting 400K rows

CrazyMan

SSChampion

Points: 12560
More actions
August 1, 2008 at 9:12 am

#123650

Hi All
I am moving around 400K records from one live database to other on a daily basics, and then delete from this table, when this delete option is running, SQL blocks the table, i tried to delete records in batch of 100 using a while loop( know its a bad idea), but i am resulting on a dead lock situation, is there any way to delete this 400K records without affecting the live table been read or updated by application
Thanks in advance

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

My wild guess is you are moving some identificable chunck of data, like a month or something like that. Please correct me if I'm wrong.

If this is the case I would Range Partition the table in a way I have to "move" a single partition a day, then...

1- Move the data

2- Truncate the partition -Switch and Merge using SS2K5 terminology

Done!

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

CrazyMan SSChampion Points: 12560 More actions · Answer 2

I haven't done table partition, but i guess this cant be done on this case, since they are moved according to certain range flag say 1 to 10, i cant store this flag on certain partition since this will affect performance. is there any way i can perform this on partition without compersing performance.

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 3

I suppose the value of that Flag column changes over time, isn't it?... heavy updating?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

CrazyMan SSChampion Points: 12560 More actions · Answer 4

exactly , it changes every second, it is heavy updating, thats the problem i am facing

Sblackfo Old Hand Points: 304 More actions · Answer 5

I may just be grasping at straws but could you change the isolation level? Allow dirty reads (/updates possibly deletes)? Or could you add a secondary id column to use for the deletes which then the deletes will be based upon that secondary column that is not being updated?

Chad Crawford SSChampion Points: 11610 More actions · Answer 6

I don't know if this will work for your situation, but I have implemented something in a similar using TOP.

I created a job to move records slowly from one table to another. First, it does the insert into the new table, capturing the PK for the records using OUTPUT similar to this:

INSERT INTO X

OUTPUT INSERTED.PK INTO @temptable

SELECT TOP (number) Table.*

.....

Then I do a delete from the live table, joining back to the @temptable to make sure I get the right records. I didn't put all the code in, but enough to hopefully give you an idea of how it works.

I schedule it to run during light times with short breaks in between so that it doesn't lock too long and allow other processes to run simultaneously. The TOP limits the number of records that are done, but doesn't necessarily guarantee that the records are moved in any particular order. But I only have to set up the criteria once and let 'er go and eventually it will get all the records that need to be moved.

Hope this helps!

Chad

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 7

Sometimes desperate situations require desperate solutions.

Here is a crazy idea.

Imagine you have to archve and purge all rows where flag=1 while you have to keep rows where flag has any value from 2 to 10.

1- Create two tables

Table TABLE_FLAG_1 and TABLE_FLAG_2_10

2- Put all your data in TABLE_FLAG_2_10

3- Create a view YOURVIEW that exposes TABLE_FLAG_1 union TABLE_FLAG_2_10

Set your application to point to YOURVIEW

4- When you are updating rows and updated flag value goes from 2 to 10 do and update.

When you are updating rows and updated flag value is 1 do an insert on TABLE_FLAG_1 and delete the old row -this provides you with a pre-archiving process while all rows are still visible to the application thru YOURVIEW

5- Once a day move the content of TABLE_FLAG_1 to whatever place you are moving it today, then truncate TABLE_FLAG_1

Voila!

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 8

The loop idea isn't actually a bad one (I refer to it as a "DELETE CRAWLER"), but deletes, by themselves, run pretty quickly depending on the criteria. I'd limit the deletes to 25k rows (about a second of duration) at a time and leave 5 or 10 seconds between deletes for the system to do other things.

The key to avoiding deadlocks with some effectivety IS, in fact, the isolation level being used... but it's not what most folks would expect... Lock the whole table with TABLOCKX for the delete... the other processes will simply wait instead of deadlocking.

Here's an example of the delete crawler I use... it comes complete with a "test data generator" so that you can "play" to see what works best on your box...

--===== If the test table exists, drop it

IF OBJECT_ID('dbo.JBMTestDetail','U') IS NOT NULL

DROP TABLE dbo.JBMTestDetail

GO

--===== Create and populate a 1,000,000 row test table.

-- Column "ConnID" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)

-- Column "Key1" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "Time_Stamp" has a range of >=01/01/2005 and <01/01/2015 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

ConnID = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(30)),''), --(10 rows per connection)

Key1 = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(20)),''), --just to test index with

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

Time_Stamp = ISNULL(CAST(RAND(CHECKSUM(NEWID()))*3652.0+38351.0 AS DATETIME),0),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTestDetail

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== Create indexes similar to Troy's

CREATE CLUSTERED INDEX IXC_JBMTestDetail_Time_Stamp_ConnID ON dbo.JBMTestDetail (Time_Stamp,ConnID)

CREATE NONCLUSTERED INDEX IX_JBMTestDetail_ConnID_Key1 ON dbo.JBMTestDetail (ConnID,Key1)

GO

--===== Setup to measure performance...

SET STATISTICS TIME ON

--===== Define the cutoff date with a time of "midnight" or, if you will,

-- define the cutoff date with no time so we only delete whole days.

DECLARE @CutoffDate DATETIME

SELECT @CutoffDate = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),-94)

--===== Limit all further queries, including deletes, to 25,000 rows

-- (about 1 second worth of deletes, like I said before)

SET ROWCOUNT 25000

--===== See if any rows qualify for deletion. If even just one exists,

-- then there's work to do and @@ROWCOUNT will be > 0.

-- DO NOT PUT ANY CODE BETWEEN THIS SELECT AND THE WHILE LOOP OR

-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP

SELECT TOP 1 1 FROM dbo.JBMTestDetail WHERE Time_Stamp < @CutoffDate

--===== If the rowcount from the above is greater than 0,

-- then delete 25,000 rows at a time until there's nothing

-- left to delete

WHILE @@ROWCOUNT > 0

BEGIN

--===== Just a "marker" to separate the loop in the output

PRINT REPLICATE('=',78)

--===== This delay gives other processes breathing room

WAITFOR DELAY '00:00:10'

--===== Do the delete. Will be limited by the SET ROWCOUNT above.

-- DO NOT PUT ANY CODE BETWEEN THIS DELETE AND THE "END" OR

-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP.

DELETE dbo.JBMTestDetail WITH (TABLOCKX)

WHERE Time_Stamp < @CutoffDate

END

--===== Restore the ability to process more than 25,000 rows

SET ROWCOUNT 0

SELECT 1000000 - COUNT(*) FROM jbmtestdetail

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CrazyMan SSChampion Points: 12560 More actions · Answer 9

Hi Jeff

This is a nice way, but i cant block the table for this time, since the table is accessed by application via many threads, once the table has been blocked or not avaliable, the thread waits for a certain time and thus causing a delay in process(the applicaiton is designed in such a way that it waits for some time and retires again). i am thinking of moving a small amount of data for every half hour into the other table and thus reducing the load on the database, hope this helps, is there any other way????

Thanks for all your posts,:)

bkrishnasamy SSC Enthusiast Points: 114 More actions · Answer 10

why aren't use the table -nolock option.

its not applicable for delete option. but u can use it in application proc, when you are selecting records(select option).

i think its avoid the table lock.

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 11

I am not sure if this is even an option but you could try using a logical delete ... and then perform the actual delete when there is a window during which there are fewer updates going on .

Jayanth Kurup[/url]

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 12

CrazyMan (8/4/2008)
Hi Jeff
This is a nice way, but i cant block the table for this time, since the table is accessed by application via many threads, once the table has been blocked or not avaliable, the thread waits for a certain time and thus causing a delay in process(the applicaiton is designed in such a way that it waits for some time and retires again). i am thinking of moving a small amount of data for every half hour into the other table and thus reducing the load on the database, hope this helps, is there any other way????
Thanks for all your posts,:)

You can change the length of the deletes to a much smaller number of rows per delete so the lock doesn't take as long.

Of course, if you did it during a quiet time, it should only take 16-30 seconds to delete all 400k rows in a single lock.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CrazyMan SSChampion Points: 12560 More actions · Answer 13

Thanks Jeff

Its supposed to take only a few seconds, since the table is continusly accessed by applicaitons, this is taking me 3 minutes to delete on a Quite time, i tried on a busy time and this took me 5 minutes to delete 24K rows and the table has been locked for around 2 minutes, i used isolation level lock as well, (READ UNCOMMITED), still it blocks, so now i cant use this to delete smaller number of rows, i am about to try loop as u recommended after 30 minutes, with the same number of rows, application gets busy in late afternoon.

Cheers

🙂

CrazyMan SSChampion Points: 12560 More actions · Answer 14

Hi Jaykay

I am using nolock to access data, and then store this on a temp table, index the temp table as same as the master table and then delete rows from the master table by join.

🙂