Best Practices needed on Massive Deletion in 24/7 production environment

Question

Best Practices needed on Massive Deletion in 24/7 production environment

Frank Rizzo-261382

SSC Enthusiast

Points: 156
More actions
December 10, 2006 at 12:41 pm

#171219
I have inherited a project that uses a number of worst-practices. I am a fairly knowledgeable part-time DBA, but I am treading in areas where I have not gone before, so I am looking for some help/pointers.
I need to shrink the size of certain tables and databases and rebuild indexes in order to make the system have fewer problems that we believe are related to locking and table scanning etc.
Current Server Config:
- Windows 2k3
- SS2k 8.00.2039 Build 3790 Service Pack 1 ( and it's not going to get updated )
- private network
Environment:
1. Factory assembly line, 200-250 connections from PC's in a 24/7 operation. About 60 PC's connected at full capacity. There are very few human connections. If one PC waits on a lock it stops one section of the assembly line.
2. Bad client design in VB and DOTNET. All machines connected all the time.
3. Original design uses GUID's everywhere. Total misuse of GUID - used all over where an INT should have used, including every table PK. Some intersection table have 3 GUIDs and a datetime column.
4. Database has gone w/o DBA for 1-2 years, system slows to halt a few times a week. Architecture and design limitations are now becoming bottleneck
5. No actual scheduled downtime...seriously...just insane.
6. DB's have swelled to 40GB when they should be under 5GB using judicious archiving...Could be even smaller still if properly designed in the first place. I cannot change the design.
Overview:
1. I am deleting about 40-60% of the data in 8 bloated tables. I am creating a set of new DB's on a different server that will store this older data. These will be created on a new SQL2K5 clustered server.
2. When deleting records I cannot lock table. I use WITH (PAGLOCK) to prevent escalation.
3. Client machines get locked out or disconnected when I run mass deletion, so all my delete queries are limited to 1000 sequential records at a time in a table that had about 50M records to start. I reduced the size from 50M to 40M rows over last few days. Each deleted takes about 2-4 seconds.
4. I use a WAITFOR DELAY of 10 seconds so that I get-in/get-out quickly and play nicely with other connections
5. My goal is to reduce 20% of undeeded data, reindex and shrink databases. Then observe perfomance. If this helps, I will then remove another 20% of undeeded data and reindex and shrink again. You get the idea.
Questions: ( If the answers are elsewhere, just point me in the right direction
1. Are there some best-practice documents on how to do massive deletes?
2. I have read best-practice for when not to use GUID such as for a PK. Does having GUID PK create wacky indexing due to random nature of GUID?
3. I want to know why a table lock happens for such a small percentage of deletes. Is that due to GUID PK? What is best way to see why this happens?
4. I need to rebuild or defragment each tables indexes. What is best way to do this w/o affecting active users or getting table locks? I think that dropping and rebuilding index is way to go, but what about clustered PK that has only one GUID column?
5. Can I build new indexes without locking tables?
6. I need to shrink multiple databases. I know how to do it, but not without being in single user mode or without getting table locks. I have done all the experimenting I need in my development environment, but production is unpredictable and like I said there is no scheduled maintenance window
  Any pearls of wisdom on this subject?
Regards,
--FrankX

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

I can answer a couple of those:

I have read best-practice for when not to use GUID such as for a PK. Does having GUID PK create wacky indexing due to random nature of GUID?

There's no problem having a pk on a GUID, provided the clustered index isn't there as well. A clustered index on a GUID = massive fragmentation, very quickly.

Can I build new indexes without locking tables?

In sql 2005, yes. In SQL 2000 no. In 2000 rebuilding the nonclustereed indexes takes a shared table lock, rebuilding a cluster takes an exclusive.

You may be able to get by with index defrag, if the fragmentation isn't too bad. Defrag only reorders the leaf levels of the index, reindex rebuilds the entire thing. Index defrag is an online operation.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Just throwing this out there... don't have any kind of revelant experiance with such a massive amount of data.

Is there any way you could buy an entirely new server, transfer all the data to that machine and do all the index rebuilds/maintenance you need while all the while doing defrags on the current system. Then once the new system is ready (with better indexing and less data), switch to this new more powerfull server?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

One other thing, that I forgot earlier.

If you shrink a data file, SQL puts pages anywhere that there's space. A database shrink can completely undo a reindex.

Your order should be more like delete 20% of the data, shrink the database to about 10% free, reindex the tables. If you do the reindex and the shrionk in the other order, you might as well have not done the reindex at all.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Frank Rizzo-261382 SSC Enthusiast Points: 156 More actions · Answer 4

GilaMonster,

Thanks for posts. I know I have a lot of quesions. I just wrote a response and the system did what I thought was a POST, and then it did nothing....ugh. Not sure if it's this WYSIWYG editor or what, but that is annoying...maybe it does not post ASAP...I dunno? I happened again on my second post attempt. Maybe session related ...maybe ie7..double ugh.

Anyway, what I was thinking was to add an alternate key using an identity int, and then when I can schedule some down time, drop the clustered PK GUID index, change the indentity int column to be the new PK. I will have to do this for 6-7 tables, one at a time. I am in no rush, doing a little each day is the best I can hope for.

Some more issues/questions...as related to locks and limited maint window:

Does DBCC SHOWCONTIG (table, index) lock tables and if so is there some work around?
In general can I add a non-clustered index w/o affect concurrency (blocking other users)?
I will follow your advice on delete, shrink db, reindex, but can I do this without locking tables for long periods of time? I can probably live with locking a single table for 3-5 seconds per minute.
What is best way to truncate log while doing mass deletes? While I am deleting, more rows get added. I am backing up log after each delete now, but that does not seem optimal.

--FrankX

Frank Rizzo-261382 SSC Enthusiast Points: 156 More actions · Answer 5

Ninja's_RGR'us,

We do have a new server that we will migrate to. There is a planned migration, but the clients are PC's (about 50) attached to some assembly type machinery. It not so easy to point them to a new server. In addition the new server is SQL2005 and current is SQL2000, so I have lots of hesitation on migration without lots of planning and testing. Please keep in mind that this envoronment is 24/7 manufacturing with managers unwilling to give maintence windows. It's a real pain.

That said, I am trying to fix blocking issues on current server. When databases were small, there were few problems. Now that they have grown, we believe the size of DB, use of GUID's (used as PK on all tables - clustered index..Thanks GilaMonster) and increase in number of attached users has caused many problems. As of right now, we believe that slow deletion of rows will be the crutch to limp along on till we can schedule a migration to the "new" server with management approval.

Thanks,

--FrankX

Steve Jones - SSC Editor SSC Guru Points: 736642 More actions · Answer 6

Frank,

I think you have done the deletes the best way you can and it's what I've done in similar situations.

A few years ago at PASS we had a debate about int v GUID and the MS architecture guys showed that aside from the larger size, GUIDs are veyr efficient inside the server. There are some optimizations, so as long as they're not the clustered index (as mentioned above), you should be ok.

You have to schedule downtime. It's just not possible to run without it. And the longer you wait, the more likely that it will come to a halt and you'll be down days trying to get back up. You cannot emphasize this enough to managers and have them contact consultants or any DB professional and they'll get confirmation.

That being said, I used to work in a nuclear plant and our downtime was Sat midnight-4am Sunday once a quarter. Sucked, but that was what we worked with. I'd ask for something similar and as much as it stinks for you that weekend, you'll get some benefits back from a better running system. They certainly can't be more critical than the nuke plant!

Delete your data, then once it's gone, do the shrink. And move the index as soon as it's possible. At least make the GUID nonclustered. You can pick some other field or post the schema and we'll help on a good candidate for a clustered index.

I believe the SHOWCONTIG does grab a shared lock because it has to read through all the extents. Be careful of running it, or run it on a particular table and see if issues arise. Maybe run it in between your deletes. However until you remove the data, it doesn't make sense to run this since you can't do anything about it.

Log backups maintain recoverability. Alternatively if you can tolerate the risk, maybe run a BACKUP LOG WITH TRUNCATE ONLY in between a few deletes and then run a full every couple of hours (or a diff)

Oh, and apologies on this posting. The forum sometimes flakes a little. I CTRL-A, CTRL-C before hitting post

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

No need to move the pk, just get the clustered index onto another column. By preference (mine), static, ever increasing, narrow (esp if you have lots of NC indexes), unique.

You can do a showcontig with fast. It only returns some of the info that showcontig does, but is usually faster.

I just wrote a response and the system did what I thought was a POST, and then it did nothing....ugh.

It's the session timout. Happens to me often, especially if I've been writing sample code. I usually copy the message into notepad before hitting post.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Frank Rizzo-261382 SSC Enthusiast Points: 156 More actions · Answer 8

Thanks for all of your helpful hints. Thay have pointed me in the right direction and kep me from wasting time where it's not needed. That has made my job more bearable. Now I just have to convince management on how scheduled downtime is like sleep for humans, you can descrease sleep, but you will reduce productivty to almost 0 if you try hard enough

Also please forgive any rambleing or mispelling ...I am just dog tired....

1. Here is my worst offending table from ~30 days ago..its worse now. I am not an expert, so don't wail on me for not know if low is bad or low is good. Blah-Blah. For fragmentation my understanding in lower = better. I don't have time to post the DDL for the table, but I will later if requested. I am just looking for feedback on whether these number are just crazy bad or not so bad. The bold ones seem pretty bad.

DBCC SHOWCONTIG scanning table...

Table: 'XXXs' (1172511556); index ID: 1, database ID: 7

TABLE level scan performed.

- Pages Scanned................................: 1031098

- Extents Scanned..............................: 129383

- Extent Switches..............................: 1031097

- Avg. Pages per Extent........................: 8.0

- Scan Density [Best Count:Actual Count].......: 12.50% [128888:1031098]

- Logical Scan Fragmentation ..................: 51.36%

- Extent Scan Fragmentation ...................: 73.64%

- Avg. Bytes Free per Page.....................: 2770.8

- Avg. Page Density (full).....................: 65.77%

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

2. In single user mode in my test environment that snapshot of production, I have a plan: Next step is to drop all indexes/constraints. Add Identity Int as new clustered PK, Add old indexes back on. Rerun DBCC SHOCONTIG and see whats improved. I will then delete about 50% of the data in the table. Shrink the db files and the re-run a DBCC SHOWCONTIG again and drop/recreate all indexes again. I might have not got all the steps 100% correct with my words, but I think I am on th right track and I will re-read the top of this thread again to double check my plan.

3. What is best set of steps for dropping PK_XXX constraint and underlying tabke of a 30M row table.

I am using ALTER TABLE [dbo].[XXX] DROP CONSTRAINT [PK_XXX], which I believe is the correct way. Does the fact that the PK is a clustered GUID add a lot of overhead?

--FrankX

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Logical fragemntation of 50% is bad, but it's not too bad, especially if it's a GUID cluster. I've had one up at 99.98%. The extent fragentation is worse. I do't think it's panic stations yet, but it will be causing problems (slow queries, high I/Os)

The page density should be fairly high. It's a measure of how full a page is. I like numbers around 80% for that, it means I'm not wasting too much space in my DB.

The alter table is the proper way to drop a pk. Since it's also the clustered index, make sure that you drop all the nonclustered indexes first. Rebuilding (or dropping) a cluster will make SQL rebuild every single non clustered index on that table.

It's not necessary to change your pk to an int. That will have some nasty effects on related rows/foreign keys. Is there a better candidate in the table for the cluster? If you're not sure, post the DDL here and we can take a look at it and offer suggestions.

If there's nothing else, add the identity, put the cluster on it and leave the guid as primary key but nonclustered

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass