What is the most efficient way to delete records?

Question

What is the most efficient way to delete records?

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

brutsoft SSC-Addicted Points: 485 More actions · Answer 1

Sql Server Books online http://msdn.microsoft.com/en-us/library/ms188774.aspx says:

Important:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. For more information, see DELETE (Transact-SQL), INSERT (Transact-SQL), or UPDATE (Transact-SQL).

So I guess in future versions this trick won't work...

brutsoft SSC-Addicted Points: 485 More actions · Answer 2

Leo.Miller (5/6/2010)
How big is the table you are testing against?
Also are the table layouts the same?

I'm testing against the real table.

brutsoft SSC-Addicted Points: 485 More actions · Answer 3

These are the stats from the first run of the script (yesterday):

24242010-05-07 09:50:04.993
23232010-05-07 07:59:53.770
22222010-05-07 06:51:33.177
21212010-05-07 06:01:14.253
20202010-05-07 05:15:18.837
19192010-05-07 04:32:22.620
18182010-05-07 03:54:23.123
17172010-05-07 03:11:45.967
16162010-05-07 02:25:01.717
15152010-05-07 01:47:50.197
14142010-05-07 01:10:34.580
13132010-05-07 00:31:06.697
12122010-05-06 23:51:33.583
11112010-05-06 23:15:31.950
10102010-05-06 22:40:30.957
992010-05-06 22:05:49.300
882010-05-06 21:30:16.277
772010-05-06 20:57:32.897
662010-05-06 19:30:56.320
552010-05-06 18:49:25.710
442010-05-06 18:11:45.787
332010-05-06 17:29:32.040
222010-05-06 16:44:40.407
112010-05-06 16:08:33.557

brutsoft SSC-Addicted Points: 485 More actions · Answer 4

And here the first million with the new script:

13122010-05-07 12:32:35.697
13012010-05-07 12:01:39.113

If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 5

Leo.Miller (5/6/2010)
You asked: Why are you saying that to find the TOP (1000000) sql must check all the records?
The only way to find the TOP record is to look at them all, then sort them in the required order, then take the top one off the top. It helps if you have a clustered index, but SQL is still doing a lot of extra reads.
Try running the queries below with the SHOW QUERY PLAN ON and after setting
SET STATISTICS IO ON
SELECT TOP 1 (*) FROM Big_Table where [DATE] < '1 Jan 2010'
vs
SET ROWCOUNT 1
SELECT * FROM Big_Table where [DATE] < '1 Jan 2010'

I'm just not sure why you think that... TOP stops looking as soon as its sufficiency has been suffonsified. But don't take my word for it... take your own code's word for it...

DROP TABLE JBMTest

GO

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 100,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

-- Takes about 1 second to execute.

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomeID)

--===== Demo that TOP does NOT have to read through all the rows

-- any more than ROWCOUNT does

set statistics io,time on

SELECT TOP 10000 * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'

SET ROWCOUNT 10000

SELECT * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'

SET ROWCOUNT 0

set statistics io,time off

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 6

brutsoft (5/6/2010)
And here the first million with the new script:
13122010-05-07 12:32:35.697
13012010-05-07 12:01:39.113
If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brutsoft SSC-Addicted Points: 485 More actions · Answer 7

Ok I think I solved my problem as it happes frequently in a way that could not have been suggested on a forum because it's not possible to present complete information on a forum.

I do have a "master" copy of my database sitting on the same server. So what I did is truncated the table completely and ran a loop inserting records million by million from the "master" copy. It goes with the rate of one million every two minutes it means that in an hour the copy will be completed. After this I will have to shrink my database that will take several more hours so by tommorow morning I should be all set.

brutsoft SSC-Addicted Points: 485 More actions · Answer 8

Jeff Moden (5/6/2010)
How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.

There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 9

brutsoft (5/6/2010)
Jeff Moden (5/6/2010)
How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?
Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.
There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

That's a very big part of why deletes are taking so long. They affect every index and are affected by every FK.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brutsoft SSC-Addicted Points: 485 More actions · Answer 10

I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

Jeffrey Williams SSC Guru Points: 90335 More actions · Answer 11

brutsoft (5/9/2010)
I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

You are aware that shrinking the database causes massive fragmentation of all of your indexes? You should schedule a rebuild of all indexes as soon as possible - and make sure you have enough space available in the data file for the rebuilding to be done.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 12

Very cool and very thoughtful... thanks for letting us know how it went.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

adhiman SSC Veteran Points: 292 More actions · Answer 13

adhiman

SSC Veteran

Points: 292

June 7, 2012 at 5:33 pm

#1498456

Please refer this

http://tsqltips.blogspot.com/

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 14

adhiman (6/7/2012)

Please refer this

http://tsqltips.blogspot.com/%5B/quote%5D

First of all, I don't see what your article has to do with deletes. You don't actually need any keys or indexes to do deletes.

Second, your blog article suggestsusing the SSN as a possible "unqiue" column. Really, really bad idea for many reasons but you also need to be aware that people can and have changed their SSN's thanks to identity theft and the like. Of course, there's also the warning on every SSN card that says it "shall not be used for identification purposes".

Now that you don't have SSN's to rely on for a unique key, what would you use for a unique key. If it's ANYTHING having to do with the person's name, then you fail because people can easily change their name and they are of course, not unique. Please don't say you'd add a sequence number to that.

For customer tables, you might just as well get used to the idea that the only proper way to identify a customer is by a customer number and that there's absolutely no reason why that customer number shouldn't be an IDENTITY column in T-SQL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

adhiman SSC Veteran Points: 292 More actions · Answer 15

Hi Jeff:

I appreciate your response to my blog.

My first article is talking about performing the deletes in an efficient manner. Suppose you have a table that have millions of rows in it and you want to perform delete on it using some condition. If you use this syntax :

begin tran;

delete from table where condition

commit;

What it will do is , it will lock the table until delete happens. Now you should know that before doing deletes , sql engine has to read the data pages for the rows it has to delete.This way the time it reads the rows to be deleted will be part of the table locking that can block other simultaneous transactions that want to do any other DML on the table(may be insert ,update). To save this blocking , i am suggesting a way to read the row Id's into a temp variable. Then you can use the index to join it to the ID's to the table from where the data needs to be deleted. The benefit of doing this is the query optimizer would already know where the data exists(because of the index) and it may not block other simultaneous inserts to happen at the same time. This way you can prevent blocking to some extent and I have used this methodology on a table of 50 million records and it works like a champ. The main aim is to suggest a better way of doing the same thing.

Secondly, in my other blog which talks about PK and UK. The example of adding an SSN was an idea(not a perfect suggestion).The main aim of that blog is to help understand the basic misconceptions about the PK and UK that people have and i have read on the internet. You are absolutely correct that in the example i have given the best way to keep track of uniqueness of by adding an ID column. Probably i will come up with a better example. The idea is to make readers aware that they should consider thinking about the uniqueness of the rows not just by adding an Id. It does help query optimizer to come up with an optimal plan for the query execution.

I really appreciate your thoughts.

Please let me know if you have questions.