Deleting 400K rows

Question

Deleting 400K rows

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 1

CrazyMan (8/4/2008)
Thanks Jeff
Its supposed to take only a few seconds, since the table is continusly accessed by applicaitons, this is taking me 3 minutes to delete on a Quite time, i tried on a busy time and this took me 5 minutes to delete 24K rows and the table has been locked for around 2 minutes, i used isolation level lock as well, (READ UNCOMMITED), still it blocks, so now i cant use this to delete smaller number of rows, i am about to try loop as u recommended after 30 minutes, with the same number of rows, application gets busy in late afternoon.
Cheers
🙂

Wow! That's a lot of time. Are there a lot of foreign keys or, perhaps, a trigger that's involved?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CrazyMan SSChampion Points: 12560 More actions · Answer 2

There are 10 indices out of which one is clustered, no trigger, does this affect deleting???

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 3

CrazyMan (8/5/2008)
There are 10 indices out of which one is clustered, no trigger, does this affect deleting???

The clustered one is the easy one for deletes... the other 9 are add a significant amount of time to deletes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlemail@yahoo.com SSC Eights! Points: 919 More actions · Answer 4

1. Why dont you write a SQL Job that will keep deleting the rows in smaller chunks & let this JOB keep running

2. Watch the pattern of rows based on its usage & then you know what rows are OLD & what rows are NEW, so you define your logic in such a way that you delete the OLD Ones using you SQL Job

What we did was, we created another table which stored only the ID (PK) & a Date (latest Update Date) for those IDs. Then our Job would delete those rows Sorted by Row Desc

ID Time

1 7:15 AM

100 9:00 AM

24 9:01 AM

23 9:02 AM

111 11:00 AM

999 11:01 AM

in the above table, if you look at Time, its very clear that we can delete ID 1,100,24,... in that order...

paul

CrazyMan SSChampion Points: 12560 More actions · Answer 5

Thanks Paul,

This seems to be a nice Idea, i tried to run the same SQL frequently so it deletes the rows in small chunks

Thanks for the posts:)

Cheers

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 6

Jeff has the right idea up there. Those non-clustered indexes are probably hurting your performance unless they are also used to find the records to delete in the first place. Following are a couple of things I generally watch out for when looking at performance of a query that processes a bunch of data.

Parallelism:

If you look at your query plan for the delete, I'm sure its parallelized since you're joining a temp table to process the deletes. I'd look at adding a WITH (MAXDOP=1) hint or increasing the server setting for 'cost threshold for parallelism' from 5 seconds to something more realistic. I've seen it mentioned that 20 seconds seemed to be a more realistic value for some.

For something like this, it might help to have all the worker threads banging away at the same operation at the same time rather than split up and marshalling.

tempdb performance

Whenever you're using tempdb for things like this, SQL Server also uses tempdb for "in-process" query processing. For example, SQL Server may decide its more efficient to sort your data in a particular order before seeking the values in another index for a JOIN predicate. So while your temporary table is getting hammered on the same physical disks, SQL Server internals are also hammering away on it to process your query. On a "write" (and yes, a delete is a write operation), the more indexes you have on a particular table, the more the internals will bang away on in-process objects in tempdb. So when you spit your query up into "chunks" like you say, you're probably triggering SQL Server to create these objects in RAM rather than on disk, hence why the smaller bits run faster.

Statistics

Always make sure you have updated stats on the tables involved. Even if you have auto create and auto update stats turned on, depending on how many rows are in the table, the heuristics may not contain the samples that would give SQL Server clues as to how costly the query will be, or if the index may be helpful to find the data in the first place. This is because the trigger to update a stat is based on a formula (500 changes + 20% of the total rowcount), and honestly, 20% of the total rowcount of a large table can be a huge number. Better cost estimates mean better query plans, better use of memory and disk, ta-da...

That's about as non-technical of a description I can give you. I hope you get the idea.