May 6, 2010 at 6:11 pm
Sql Server Books online http://msdn.microsoft.com/en-us/library/ms188774.aspx says:
Important:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. For more information, see DELETE (Transact-SQL), INSERT (Transact-SQL), or UPDATE (Transact-SQL).
So I guess in future versions this trick won't work...
May 6, 2010 at 6:12 pm
Leo.Miller (5/6/2010)
How big is the table you are testing against?Also are the table layouts the same?
I'm testing against the real table.
May 6, 2010 at 6:23 pm
These are the stats from the first run of the script (yesterday):
24242010-05-07 09:50:04.993
23232010-05-07 07:59:53.770
22222010-05-07 06:51:33.177
21212010-05-07 06:01:14.253
20202010-05-07 05:15:18.837
19192010-05-07 04:32:22.620
18182010-05-07 03:54:23.123
17172010-05-07 03:11:45.967
16162010-05-07 02:25:01.717
15152010-05-07 01:47:50.197
14142010-05-07 01:10:34.580
13132010-05-07 00:31:06.697
12122010-05-06 23:51:33.583
11112010-05-06 23:15:31.950
10102010-05-06 22:40:30.957
992010-05-06 22:05:49.300
882010-05-06 21:30:16.277
772010-05-06 20:57:32.897
662010-05-06 19:30:56.320
552010-05-06 18:49:25.710
442010-05-06 18:11:45.787
332010-05-06 17:29:32.040
222010-05-06 16:44:40.407
112010-05-06 16:08:33.557
May 6, 2010 at 6:36 pm
And here the first million with the new script:
13122010-05-07 12:32:35.697
13012010-05-07 12:01:39.113
If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!
May 6, 2010 at 7:44 pm
Leo.Miller (5/6/2010)
You asked: Why are you saying that to find the TOP (1000000) sql must check all the records?The only way to find the TOP record is to look at them all, then sort them in the required order, then take the top one off the top. It helps if you have a clustered index, but SQL is still doing a lot of extra reads.
Try running the queries below with the SHOW QUERY PLAN ON and after setting
SET STATISTICS IO ON
SELECT TOP 1 (*) FROM Big_Table where [DATE] < '1 Jan 2010'
vs
SET ROWCOUNT 1
SELECT * FROM Big_Table where [DATE] < '1 Jan 2010'
I'm just not sure why you think that... TOP stops looking as soon as its sufficiency has been suffonsified. But don't take my word for it... take your own code's word for it...
DROP TABLE JBMTest
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 100,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
SELECT TOP 1000000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
-- Takes about 1 second to execute.
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (SomeID)
--===== Demo that TOP does NOT have to read through all the rows
-- any more than ROWCOUNT does
set statistics io,time on
SELECT TOP 10000 * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'
SET ROWCOUNT 10000
SELECT * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'
SET ROWCOUNT 0
set statistics io,time off
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2010 at 7:53 pm
brutsoft (5/6/2010)
And here the first million with the new script:13122010-05-07 12:32:35.697
13012010-05-07 12:01:39.113
If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!
How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2010 at 7:53 pm
Ok I think I solved my problem as it happes frequently in a way that could not have been suggested on a forum because it's not possible to present complete information on a forum.
I do have a "master" copy of my database sitting on the same server. So what I did is truncated the table completely and ran a loop inserting records million by million from the "master" copy. It goes with the rate of one million every two minutes it means that in an hour the copy will be completed. After this I will have to shrink my database that will take several more hours so by tommorow morning I should be all set.
May 6, 2010 at 8:01 pm
Jeff Moden (5/6/2010)
How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?
Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.
There is not foreign key INTO this table but there are several foreign keys from this table to other tables.
May 6, 2010 at 8:45 pm
brutsoft (5/6/2010)
Jeff Moden (5/6/2010)
How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.
There is not foreign key INTO this table but there are several foreign keys from this table to other tables.
That's a very big part of why deletes are taking so long. They affect every index and are affected by every FK.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2010 at 7:16 pm
I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.
May 9, 2010 at 9:42 pm
brutsoft (5/9/2010)
I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.
You are aware that shrinking the database causes massive fragmentation of all of your indexes? You should schedule a rebuild of all indexes as soon as possible - and make sure you have enough space available in the data file for the rebuilding to be done.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 9, 2010 at 9:53 pm
Very cool and very thoughtful... thanks for letting us know how it went.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2012 at 5:33 pm
Please refer this
June 8, 2012 at 1:05 pm
adhiman (6/7/2012)
http://tsqltips.blogspot.com/%5B/quote%5D
First of all, I don't see what your article has to do with deletes. You don't actually need any keys or indexes to do deletes.
Second, your blog article suggestsusing the SSN as a possible "unqiue" column. Really, really bad idea for many reasons but you also need to be aware that people can and have changed their SSN's thanks to identity theft and the like. Of course, there's also the warning on every SSN card that says it "shall not be used for identification purposes".
Now that you don't have SSN's to rely on for a unique key, what would you use for a unique key. If it's ANYTHING having to do with the person's name, then you fail because people can easily change their name and they are of course, not unique. Please don't say you'd add a sequence number to that.
For customer tables, you might just as well get used to the idea that the only proper way to identify a customer is by a customer number and that there's absolutely no reason why that customer number shouldn't be an IDENTITY column in T-SQL.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 8, 2012 at 2:27 pm
Hi Jeff:
I appreciate your response to my blog.
My first article is talking about performing the deletes in an efficient manner. Suppose you have a table that have millions of rows in it and you want to perform delete on it using some condition. If you use this syntax :
begin tran;
delete from table where condition
commit;
What it will do is , it will lock the table until delete happens. Now you should know that before doing deletes , sql engine has to read the data pages for the rows it has to delete.This way the time it reads the rows to be deleted will be part of the table locking that can block other simultaneous transactions that want to do any other DML on the table(may be insert ,update). To save this blocking , i am suggesting a way to read the row Id's into a temp variable. Then you can use the index to join it to the ID's to the table from where the data needs to be deleted. The benefit of doing this is the query optimizer would already know where the data exists(because of the index) and it may not block other simultaneous inserts to happen at the same time. This way you can prevent blocking to some extent and I have used this methodology on a table of 50 million records and it works like a champ. The main aim is to suggest a better way of doing the same thing.
Secondly, in my other blog which talks about PK and UK. The example of adding an SSN was an idea(not a perfect suggestion).The main aim of that blog is to help understand the basic misconceptions about the PK and UK that people have and i have read on the internet. You are absolutely correct that in the example i have given the best way to keep track of uniqueness of by adding an ID column. Probably i will come up with a better example. The idea is to make readers aware that they should consider thinking about the uniqueness of the rows not just by adding an Id. It does help query optimizer to come up with an optimal plan for the query execution.
I really appreciate your thoughts.
Please let me know if you have questions.
Viewing 15 posts - 16 through 30 (of 35 total)
You must be logged in to reply to this topic. Login to reply