February 10, 2010 at 1:47 am
Hello all,
I have a quite large table wich inserts some kind of log to a table with the following (simplified) structure
id (PK), datetimecreated (Indexed), xml
It contains about 1 mio rows for each day. Data is contantly inserted. Nightly Im going to extract data from the xml column and delete (!) old data.
I suppose this is a very common "ETL" task also for Data Mining and the like (extracting data from an OLTP system and insert int in an OLAP system).
To avoid locks on the table while extracting, I use the with (nolock) hint. (As these are statistics, the solution is acceptable.) Data can be inserted and the system is responsive.
The problem arises when I try to delete the old data (for example, everything older than 3 days): the operation takes up to 20minutes and the system is unresponsive for some time (about a minute). As mentioned earlier, data is inserted constantly (although no updates at all).
What would you suggest in order to solve this problem?
I had the following ideas:
- split over 2 tables
- delete a smaller amount of data (I think it wont help much)
thanks for any comments and suggestions!
February 10, 2010 at 3:47 am
Since the volume of data is enormous..try partitioning the table as date based and further the records can be purged based on partitions.The other option is to delete the records in batches and delay time between batches to just release the resources for other processes...
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply