January 11, 2010 at 8:04 am
I have a table that tracks all requests of a web service. There are around 30 - 100 thousand rows that get created throughout the day (includes request and response).
Also, I have a batch job that runs nightly that, first, gathers statistics based on the type of requests that come through (using group by) and writes them to a separate table, then the second part will delete ALL rows that are at least 7 days old.
Here's my question, if I implement row-locking on the delete step, that shouldn't cause any issues for the regular inserts that go into this table, right?
The only time this table is "queried" is during the day, the job runs at 4am so only activity will be more new rows for requests and responses?
TIA
January 11, 2010 at 8:10 am
Why do you want row locking there?
Before going that step, have you checked indexes?
Is this Enterprise Edition? If so, have you considered table partitioning. With a partitioned table, you can drop an entire partition of rows as a metadata operation (fast)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 11, 2010 at 8:16 am
GilaMonster (1/11/2010)
Why do you want row locking there?Before going that step, have you checked indexes?
Is this Enterprise Edition? If so, have you considered table partitioning. With a partitioned table, you can drop an entire partition of rows as a metadata operation (fast)
Never heard of the partition thing...I'll look into that.
Yes, this is 2005 enterprise edition, sp2 I believe.
Right now, the customer is receiving 5 9's of uptime and they are complaining that's not enough and they are seeing multiple timeouts while this batch job runs (verified by moving the batch job around and the timeouts follow the batch job).
As for indexes, there is only 1 index and it's clustered on the primary key (fewer is better for many inserts, right?) Or do we want to add more indexes on the fields that the archiving and deleting are using?
January 11, 2010 at 8:25 am
gregory.anderson (1/11/2010)
Never heard of the partition thing...I'll look into that.Yes, this is 2005 enterprise edition, sp2 I believe.
Right now, the customer is receiving 5 9's of uptime and they are complaining that's not enough and they are seeing multiple timeouts while this batch job runs (verified by moving the batch job around and the timeouts follow the batch job).
Then look at partitioning this table. You have a classic sliding window scenario here, easy to implement, major improvements in speed of discarding old records
As for indexes, there is only 1 index and it's clustered on the primary key (fewer is better for many inserts, right?) Or do we want to add more indexes on the fields that the archiving and deleting are using?
Well, if you have no indexes on the columns that the archiving and deletes are using, then SQL has to read every single row in the table to identify which ones need archiving/deleting. That means every single row will be locked at one point or another. I'm not in the slightest bit surprised you're having timeouts if that's the case.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 11, 2010 at 8:45 am
GilaMonster (1/11/2010)
gregory.anderson (1/11/2010)
Never heard of the partition thing...I'll look into that.Yes, this is 2005 enterprise edition, sp2 I believe.
Right now, the customer is receiving 5 9's of uptime and they are complaining that's not enough and they are seeing multiple timeouts while this batch job runs (verified by moving the batch job around and the timeouts follow the batch job).
Then look at partitioning this table. You have a classic sliding window scenario here, easy to implement, major improvements in speed of discarding old records
As for indexes, there is only 1 index and it's clustered on the primary key (fewer is better for many inserts, right?) Or do we want to add more indexes on the fields that the archiving and deleting are using?
Well, if you have no indexes on the columns that the archiving and deletes are using, then SQL has to read every single row in the table to identify which ones need archiving/deleting. That means every single row will be locked at one point or another. I'm not in the slightest bit surprised you're having timeouts if that's the case.
Gotcha...and the archiving and deleting parts are only using a CreatedDate field so my first try should be to add an index on that field then...that could be non-clustered then, right, since they should be in date order anyway based on the primary key inserts?
Or, should I try and implement the partition delete anyway?
Thanks alot for your help Gail!
January 11, 2010 at 9:06 am
What uses the primary key to filter/select by? If nothing, I'd suggest a 3-phase approach here.
1) Make the primary key nonclustered
2) Add a clustered index on the CreateDate column
3) Partition the table on the CreateDate. Size of partitions depends on what you're doing, but from what you've said, I'd guess 7 days. Then, each week, you can add a new partition and drop the oldest one (or switch it to the archive table, if you're archiving everything). As I said, classic sliding window.
As an additional step, consider 1, maybe 2 indexes to support the selects that you said are done during the day, if they're not on the PK or CreatedDate column.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 11, 2010 at 9:16 am
It is likely that the select queries are timing out because of blocks by the deletes. You should consider setting the database to read committted snapshot to prevent deletes from blocking queries.
Setting the database to read committted snapshot is easy to implement and should not require any code changes, so I would try that first.
Also, you should consider deleting the rows in smaller transactions to prevent holding locks for long times.
1. Insert the PKs of the rows to be deleted into a temp table.
2. Select a small number of rows from the temp table (1000 to 5000) into a second temp table.
3. Delete the rows from the prod table and first temp table that are in the second temp table.
4. Delete the rows from the second temp table.
5. Repeat steps 2 through 4 until done.
January 13, 2010 at 11:55 am
GilaMonster (1/11/2010)
What uses the primary key to filter/select by? If nothing, I'd suggest a 3-phase approach here.1) Make the primary key nonclustered
2) Add a clustered index on the CreateDate column
3) Partition the table on the CreateDate. Size of partitions depends on what you're doing, but from what you've said, I'd guess 7 days. Then, each week, you can add a new partition and drop the oldest one (or switch it to the archive table, if you're archiving everything). As I said, classic sliding window.
As an additional step, consider 1, maybe 2 indexes to support the selects that you said are done during the day, if they're not on the PK or CreatedDate column.
I found out a little more looking at the batch job more, does your suggestion change?
The archive step (which just copies the aggregate data to a separate table) does it's select by the createddate field (your suggestion of CreatedDate having the clustered index).
The delete step goes based on the primary key of the table. Would your suggestion of turning the primary key index to a non-clustered index still work in this scenario?
Thanks alot Gail, this has helped alot!
January 13, 2010 at 12:47 pm
Hard (near impossible) to say.
I suggest you set up a test database somewhere and try out both options. See how fast the archive and delete are with the cluster on the date and a noncluster on the pk, then switch the indexes around (cluster on the pk, noncluster on the date) and test again. See which one produces the more acceptable results.
I would still, as a long term, recommend you look at partitioning on the date. Until them, also try out Michael's advice on deleting in batches. It's often way better than deleting in a single chunk.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 14, 2010 at 7:25 am
I don't believe partitioning is necessary here. Just do the deletes in batches small enough that you are guaranteed index seeks/bookmark lookups (if clustered index isn't on date). That will get you row/page locking that should be very quick with small batches which should easily avoid timeout problems. Oh, and probably put a WAITFOR DELAY '00:00:05' or something between each batch to open up some headroom for other stuff to hit the table. Obviously you must have proper transactioning around the deletes to keep it all from being under one implicit transaction too as well as allow for useful error handling.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 15, 2010 at 6:06 pm
To gregory.anderson-- this is a dumb question, but no cursors are involved anywhere in this process, right? You said
...first, gathers statistics based on the type of requests that come through (using group by) and writes them to a separate table...
So, is this processing ONLY the data for today/the last full day, or all data currently in the table? Does the blocking happen during "stats gathering" or "deleting" steps, or both? How long does each step take? If unknown you could write a row to the error log when each begins/ends.
Gila, a question for you about
3) Partition the table on the CreateDate. Size of partitions depends on what you're doing, but from what you've said, I'd guess 7 days
The OP said they run the process daily, and delete all rows > 7 days old as the final step, so he's never got more than 8 days of data. I've never used partitioning, but wouldn't that mean they ought to size the partitions at 1 day? Or would 7 partions start to cause a performance problem? When you partition a table, do you have to manually join the partitions to check against all data, or is that hidden "behind the curtains"? From his description, it looks like "stats gathering" is the only step that would be crossing all the partitions- and that only if that step hits all rows.
Wouldn't making the CreateDate the clustered index, and batching deletes go a long way toward eliminating their timeouts? (so they may not even need to go with partitions) When I first read the OP I assumed the stats he gathers are only for the most recent day; if that's true and CreateDate is the CIX, then the "stat gathering", "delete" and "continuing inserts" processes would likely not overlap any data pages, right? (except possibly a few at the border of "stat gathering" and "continuing inserts") And the only one that's going to hit a lot of pages should be the "stats gathering".
And to Michael Valentine Jones, would this be poor form to batch the deletes?
DeleteMore:
DELETE TOP 1000 FROM FOO WHERE CreateDate < @whatever-- hopefully CreateDate is CIX
IF @@rowcount = 1000 BEGIN-- or 5000 or 10000 or ???
-- add time delay of a few seconds if deemed necessary
GOTO DeleteMore
END
January 15, 2010 at 6:53 pm
I think you are missing some things from your example, such as transactioning and error handling. :hehe:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 15, 2010 at 7:33 pm
SqlGuru... who, me? I was just illustrating the approach (I didn't declare/init @whatever, either). And yeah, use error trapping. I've used similar code in the past and it didn't appear to treat the repeating DELETEs as an implicit transaction-- then again in THAT case it was in query analyzer; maybe it'd happen in an SP? ...Important safety tip. Thanks, Egon.
January 16, 2010 at 1:01 am
mstjean (1/15/2010)
The OP said they run the process daily, and delete all rows > 7 days old as the final step, so he's never got more than 8 days of data. I've never used partitioning, but wouldn't that mean they ought to size the partitions at 1 day?
It's an option. So's 2 partitions each with 7 days.
When you partition a table, do you have to manually join the partitions to check against all data, or is that hidden "behind the curtains"?
I gather you didn't do any reading on partitioning.
Wouldn't making the CreateDate the clustered index, and batching deletes go a long way toward eliminating their timeouts? (so they may not even need to go with partitions)
Maybe. The reason for suggesting partitioning is because, if the table is partitioned correctly, deleting a lot of rows can be as simple as dropping a partition. That's a metadata operation, like delete table and is as fast as delete table. Even batching, deleting a lot of rows takes time.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 19, 2010 at 6:38 am
Had no idea my question would spark so much conversation....
I'll try and answer all of the questions that I've ready and not responded to yet.
There are transactions in the stored proc (only 1 though), so it sounds like I should break it up into at least 2 for now, one for stats gathering, and one for the delete.
If you mean Try/Catch blocks by the error handling suggestion, no there isn't any.
I have already changed the indexes around in my test environment like Gail mentioned, but after going through the index stats I'm noticing alot more Index Scans than seeks. Here are the stats after 1 full day's worth of use:
CreatedDate (ClusteredIndex) 6 Seeks, 576 Scans, 5 Lookups, 915 updates, 17703 records
ServiceMessageLogID (PrimaryKey, NonClustered) 0 Seeks, 2 Scans, 0 Lookups, 40 Updates, 1029 Records.
All of that is after 1 day of certification by users (mostly testing other items) and 1 batch job run that night.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply