Indexing question

  • We are running a job that deletes millions of rows from several tables in a large SQl 2000 database.  However, it runs v e r y slowly and we are trying to find ways to speed it up.   It makes sense to me to perform the deletes without updating the indexes and regenerate the indexes when the job is done.

    What is the best way to do this?  Personally, I'd like to avoid recreating the indexes if possible.

    When the snows fall and the white winds blow,The lone wolf dies but the pack survives.

    Once you've accepted your flaws, no one can use them against you.

  • How many rows would be left in the table once you delete all those rows?  The reason I ask is it may be faster to copy out the data you don't want to delete, then truncate the tables, then insert the data back into those tables.  Just a thought.

  • Good idea, But we use a moderately complex select to discover the rows to delete and the tables will have millions of rows after the deletion is performed.

    Next?

    When the snows fall and the white winds blow,The lone wolf dies but the pack survives.

    Once you've accepted your flaws, no one can use them against you.

  • A few thoughts:

    If you are deleting more than you are keeping then the temp table option above and / or dropping and rebuilding indexes is probably the best bet. Rebuilding the clustered index will of course defrag the table which might be useful after a delete.

    A quick win can be gained be dropping any referential integrity constraints on the affected tables, as checking the constraints are not being broken by the delete can be a big overhead. Check the query plan in Query analyser.  Be careful with this option though and make sure you don't breake the ref. int. rules.

    Another option I found useful is to use the 'complicated' select to create a temporary table of primary keys for the rows to delete. Then add a clustered index to the temporary table before doing the delete by joining to the temporary table.  This allows the rows to be deleted in the same order as they are stored in the table which is more efficient if you are deleting rows which are next to each other.

    Finally,  why is the 'long delete' a problem.  If it is because the system is unavailable for a long time. Try doing the deletes in smaller chunks.  There are several ways to do this.  One of which is to use the work table to select chunks of a few hundred (or thousand) rows at a time.  This way each delete can be limited to a few seconds and the 'users' will only experience short bursts of poor performance rather than be locked out for hours at a time.  If you add a short delay of a few seconds between each chunk the effect is even better.  This way,  it may take several days (or weeks) to delete everything but at least the system is still useable.

    Finally,  the 'old fashioned' way.  Add a tinyint column to every table to be deleted.  Update the column with a '1' for avery record to be deleted. Modify your application to hide rows marked for deletion and thell the users you have done it!.  Finally delete the offending rows at your leisure over the next month or two.  It won't help with disk space in the short term of course:-)

    As you will have noticed,  the ideas get a bit thin towards the end, although beleave it or not,  I have used all of these ideas with some success at one time or another.

    Hope some of it helps

     

  • very helpful.  We'd like to run this delete once or twice a year over a weekend.

    I'm thinking the best choice may be to script the indexes for all the tables, remove them and run the delete program.

    It may be a good idea to keep the indexes used by the delete select statement.

    A final question:

    Once the deletes are complete, what is the best way to shrink the database?  One option is the shrink utility included in EM.  A second choice is the dbcc shrinkdatabase or shrinkfile commands. 

    When the snows fall and the white winds blow,The lone wolf dies but the pack survives.

    Once you've accepted your flaws, no one can use them against you.

  • EM just executes the dbcc shrinkdatabase or shrinkfile procedures so it is just a matter of preference.  Personally,  I alway use the scripts i.e. DBCC because then I can run them in a scheduled job.  This way,  you don't have to worry about a broken connection causing the job to fail, and you can log off and go home 🙂  I expect the shrink will take a long time.

    As for the indexes,  the only index worth keeping is the clustered index and then only if you are using it as part of the delete criteria. deleting anything over about 20% of the table will usually cause a table scan so the non clustered indexes won't be used anyway.

    I hope you have plenty of space for the log file as deleting all of this in one go will require lots of space.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply