Deleting records from a huge table

Question

Deleting records from a huge table

Sanz

Hall of Fame

Points: 3412
More actions
February 8, 2010 at 7:09 am

#138817

I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.
I need to know the best method to delete rows from this table.
Thanx in Advance !
Sanz

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

george sibbald SSC Guru Points: 104210 More actions · Answer 1

If you have enterprise edition maybe partitioning the table will work for you, else you probably want to go with a crawler delete to avoid blowing the log and reduce contention.

If you are deleting the majority of the table could be faster to insert data you want to keep into another table, drop original, then rename new table.

We really need to know the table schema, a bit of example data and the criteria you wish to delete by.

---------------------------------------------------------------------

maechismo_8514 SSCoach Points: 17510 More actions · Answer 2

Sanz (2/8/2010)
I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.
I need to know the best method to delete rows from this table.
Thanx in Advance !

How many records approx. How huge the table you are talking about? Delete chunks of data.

Sanz Hall of Fame Points: 3412 More actions · Answer 3

The table is 135 GB in size. There are 5 columns all of which have datatype int and varchar(15).

I would keep George's suggestion as an option (Inserting into a new table) as I will be deleting nearly 50% of the data.

Any better idea ??

Sanz

Amit Singh Ten Centuries Points: 1295 More actions · Answer 4

copying to another table would require another 135 GB on the disk ..... do we have that much space ...

you could try a BCP with queryout to output to a file (smaller in size, faster as well) with the data that you eventually want to keep.

then you truncate the table

then you BCP in the file from step 1

just some thoughts....

-------------------------------------------------
-Amit
Give a man a fish and he'll ask for a lemon. Teach a man to fish and he wont get paged on weekends !! :w00t: - desparately trying to fish [/size]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 5

1. Assuming your back up is FULL RECOVERY. Perform a complete backup

2. Use Set ROWCOUNT to a resonable size (Start low, watch your log file size, and the time to process the batch, display the estimated execution plan and review it before starting your first batch)

3. At the end of each batch of deleting, backup your log file

4. Be sure to save the Full back up and all log backup, you do not want to break the chain in case you need to perform a restore.

And test, test, and test again before using on your Production DB.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

SQLBOT SSCrazy Eights Points: 8014 More actions · Answer 6

I would create the index on the columns in your delete statement's where clause.

It won't take that long if the disk is fairly fast and your delete will be far far faster.

If you have enterprise edition you can create the index with online=on.

If you do go that route, make sure to calculate the space the index will consume when you add it.

Craig Outcalt

Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258

SQLBOT SSCrazy Eights Points: 8014 More actions · Answer 7

I would create the index on the columns in your delete statement's where clause.

It won't take that long if the disk is fairly fast and your delete will be far far faster.

If you have enterprise edition you can create the index with online=on.

If you do go that route, make sure to calculate the space the index will consume when you add it.

Craig Outcalt

Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Amit Singh (2/8/2010)
copying to another table would require another 135 GB on the disk ..... do we have that much space ...
you could try a BCP with queryout to output to a file (smaller in size, faster as well) with the data that you eventually want to keep.
then you truncate the table
then you BCP in the file from step 1
just some thoughts....

That would actually take more disk space because the data would materialize as characters instead of datatypes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

Sanz (2/8/2010)
I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.
I need to know the best method to delete rows from this table.
Thanx in Advance !

Do you have a primary key on this table? I'm asking because a lot of folks don't know that a Primary Key also makes an index.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sanz Hall of Fame Points: 3412 More actions · Answer 10

Jeff the funniest thing is this table does not have any indexes on it.. Nor does it have a primary key... Having no indexes makes the queries run slowly...

This is an archive database...

Sanz

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

1) create smallest index necessary to cover your where clause. If you have lots of fields in the where clause, pick a small column or two that provide the greatest chance of isolating rows that need to be deleted.

2) do the delete in batches small enough to guarantee an index seek to get the records to delete. This can be a fairly big number for very large tables (50K, 100K, bigger?). This will do several things that are helpful such as a) avoiding table lock and blocking all access to the table b) avoid blowing up tlog if you delete a lot of records and c) avoid blowing up tempdb and flushing out RAM pages for a huge scan. It also allows for error handling/rollbacks after each batch, interim tlog backups (if you aren't in simple mode or want to do backup log with truncate_only in FULL recovery mode, etc).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mar1jxm Ten Centuries Points: 1303 More actions · Answer 12

Here is what we do with our huge tables.

Select *

INTO Table2

FROM TABLE1

WHERE YourStatement (something like Account_Start_Date < '01/01/1980'

Drop table Table1

EXEC sp_rename 'Table2', 'Table1'

Runs much faster than Delete because there is no LOGing is involved

Jacob Milter

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

mar1jxm (2/11/2010)
Here is what we do with our huge tables.
Select *
INTO Table2
FROM TABLE1
WHERE YourStatement (something like Account_Start_Date < '01/01/1980'
Drop table Table1
EXEC sp_rename 'Table2', 'Table1'
Runs much faster than Delete because there is no LOGing is involved

1) there are numerous requirements for minimal logging, and some of them are usually not available on production databases

2) many systems don't have enough space to do that for their largest tables

3) that select will block ALL DML activity to the table while it is running, which could be a long time.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 14

TheSQLGuru (2/11/2010)
3) [font="Arial Black"]that select [/font]will block ALL DML activity to [font="Arial Black"]the table [/font]while it is running, which could be a long time.

Kevin... a couple of questions on the above...

1. Which SELECT are you talking about when you say "that select" in the above? Jacob Milter's current SELECT or the SELECT that would be necessary to minimize logging?

2. Which table are you talking about when you say "the table" in the above? The table in the FROM clause or the table in the INTO clause or both?

I just want to be clear on this because there are a lot of left-over myths from the way SELECT/INTO used to operate before SQL Server 6.5 SP1.

To answer my own questions from above...

If you're talking about the Jacob Milter's current SELECT and it's a script run from SSMS and not a stored procedure, then only the destination table will be blocked from DML... the source table will not be blocked. If you're talking about a SELECT that uses the required TABLOCK hint to get minimal logging, then the source table will also be blocked from DML no matter what.

Then again, those observations may change a bit when you do the same thing in a stored procedure with some conditional logic. I'm trying to work up an example of what happens then.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)