June 12, 2008 at 7:20 am
I have an archive solution running on one of my larger databases...
Because the solution is written to enable multiple archiving solutions running in parallel & due to locations that can change, I'm using dynamic sql.
The extracting & archiving of data runs pretty well. I extract the set to be archived into a new table in a staging database & recreate the primary key. A couple of other steps happen, but I've encountered some funnies when it comes to the deletes. In the statement, I delete TOP @Batch, while joining on primary key to the extract table in the staging database. The ArchiveSet is an exact copy of what got extracted. The main database is extremely busy, so I have to delete in batches to minimize locking. At the moment I've set the batch size to 2000.
DELETE TOP @Batch
FROM SourceTable
INNER JOIN ArchiveSet
ON [PRIMARY KEY]
The delete runs well, as long as the batch size is smaller than the amount of records to be deleted. The moment the batch size is bigger, the delete can take up to double the time...
Re-indexing & stats update run on a regular basis on all the affected databases.
Anyone have any thoughts on this?
Thanks!
T.
June 13, 2008 at 4:34 am
June 13, 2008 at 5:46 am
Well, the table I'm deleting from currently contains +-150 million records... Every archive cycle extracts anything from 1 to 35000 records at a time (2000 in a batch). The archive cycle hits the table every 15 - 17 minutes, depending on the load on the server.
If I take into account the amount of indexes on the table (which I don't have rights to change) & the fact that the table grows by +-1500 records every minute, the delete times are respectable...
Because this is critical data, the deletes all happen within a transaction. Dropping 35000 records takes +-45 seconds. Changing the batch size doesn't really make a difference in total time (the WHILE loop which contains the delete statement just does fewer cycles with a bigger batch).
But, I had occasions where deleting 900 records took over 2 minutes...??
June 13, 2008 at 7:14 am
don't know if this will help, but - there was a discussion not to long ago about Top @N making some really poor decisions on execution plans. Something akin to parameter sniffing. The example found was a SELECT, but it might still apply here.
Something along the lines of :
Declare @size int
set @size=40000;
select top (@size) fieldlist from mytable --performance sucks
whereas
select top 40000 fieldlist from mytable --performance flies
Jeff Moden was the one who ran into it. It turns out it was misfiguring the cardinality, and using LOOP joins instead of MERGE join. Solution was to either "force" the MERGE join (with a join hint), or to tag the statement itself with a WITH RECOMPILE.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 13, 2008 at 8:55 am
Yes, when I substitute the @Batch variable with a static amount, the delete flies. But, hardcoding values also has it's dark side. Maintaining a code base which archives over 500 tables in the database & across 40 different environments, is a nightmare...
I did, however add the WITH RECOMPILE hint, it will over the weekend & I'll compare stats again on Monday.
Thanks for all the input so far!
T.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply