Reducing, re-indexing large tables and backup them

  • What is the best procedure/sequence to reduce some tables containing large number of rows of a SQL 2000 server?

    The idea is first to check which tables grow extremely fast (all statistics, user or log tables), reduce the table according to the number of months the user wishes to keep in the table.

    As a second step backup the remaining rows of this particular table as txt files on harddisk (using DTS), UPDATE STATISTICS and re-indexing reduced table.

    Run DTS Package every month once (delete oldest month and backup newest month) and do the same as above to keep size of tables adequate.

    What is a fast way to reduce number of rows of a large table - the following example produces an error (timeout expired) of my

    ADO connection when executing:

    SET @STR = 'DELETE FROM ' + @ProcessTable + ' WHERE ' + @SelectedColumn + ' < DATEADD (m,' +' -' + @KeepMonthsInDatabase + ', + GETDATE())'

    EXEC (@str)

    Adding ConnectionTimout = 0 did not help unfortunately.

    What is the best way to re-index the table just maintained?

    Thanks

    mipo

  • It might be faster if you drop the indexes first, then perform the delete and then recreate the indexes. You will need to test whether removing the index on @selectedcolumn date helps or hurts. Could you possibly have each month in different tables and use a view to pull all the tables together? That way you wouldn't have to do the delete. I have a limited understanding of indexed views in SQL 2000, but I believe since the tables would be the same structure and are in the same database, the view should perform reasonably quick. Then you could back up the database monthly, drop the oldest month's table from the view and then drop it from the database. Much quicker than reindexing and deleting.



    Michelle

  • I try to do all this type stuff daily, keeps the volume down. Setting the timeout higher is valid for this type of operation, but you have to also look at the impact of the change on the server. If you've got a delete query running for 30 mins you'll probably block other processes, plus build up a hugh number of entries in the log, plus the chance it will still timeout or fail and have to do a long rollback as well.

    If I think Im going to need it, it stays in SQL. Not same table or db, but it stays. If I dont need it, I try to put into a separate mdf, detach, burn to CD, or at least do a backup, then burn.

    Planning for archiving should be done as soon as the table is created!

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Thanks for your help

    I still have three questions:

    How do I remove an index on @SelectedColumn

    of my dynamic SQL statement?

    I do not understand what you mean by "have each month in different tables and use a view to pull all the tables together - that way you wouldn't have to do the delete"?

    Why does adding ConnectionTimeout=0 to my ADO connection (in my VBScript in DTS package) still time out as it should never time out?

    I appreciate your help

    Thanks

    mipo

  • Create a new table for each month's data. Then have your users access the view instead of the individual tables. Then on a regular archive schedule, backup the table and remove it from the database. No deletes, no reindexing. See Books online for ways to create a view.



    Michelle

  • You can speed up your view by putting a date_key column of datatype int in each table. Then put a Constraint on the date_key column in each table. Constrain the values in the date_key column to those values representative of the month the data in the table represents. For instance, your date_key column for the month of January 2003 would be something like, 'date_key BETWEEN 20030101 AND 20030131'. When a query is ran against the view, sql server will only query those tables that satisfy the date criteria of the query. A query that needs info for the days from 20020315 through 20020402 would only look at the tables for March and April of 2002. Without this constraint, sql server will query all tables contained in the view. Search BOL for Constraints.

    hth,

    Michael

    Michael Weiss


    Michael Weiss

  • There are two settings, one for connection timeout and one for command timeout - you've set the command one?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • I set the ConnectionTimeout = 0 but still after ca. 60 secs running ActiveX task in DTS package it times out. Doing the deleting task in query analyzer takes ca. 4 minutes.

    Thanks

    mipo

  • Trying to give you a good solid solution may be simple or complex in what we feel may work best for you. However how many months are kept in the DB at any one time and how are inserts, updates and normal deletes done now code wise (SP, Direct queries, etc...)? The solutions here all sound good but you are doing it dynamically and not sure what will best fit your needs.

  • Thanks for your interest on this topic.

    The problem is that we did not think yet of maintaining the databases yet as we delivered the systems with SQL 7 and SQL 2000. For our application the dbs contain tables which grow extremely fast as log tables or transaction tables or statistic tables. Some system running since 3 years but the customers do not need the data for such a long period in fact. The customer should decide by using this archive utility how long he wants to keep data in the databases and how long we wants to keep the data extracted on the harddisk.

    By using the archive utility the first time it probably shrinks data in tables from 3 three years down to one year and SQL server is quite a bit busy to do this job. I am not sure but as a side effect it probably needs some space on the harddisk to do this.

    Thanks

    mipo

  • You are rigth about the task of deleting using HD space. In fact for every delete there is a row delete marked in the Transaction Log with the data removed and it has to remove the related data from the indexes so thus slow you down even further. Here is my thought on it.

    1) Add a column called Archived with a bit value (1/0) if 0 then the data is live if 1 archived. When they do the delete you actually just flip those rows from 0 to 1.

    2) Overnight or during an offpeak hour export those rows with a 1 to text file or to a DB backup (long term saves are better as text file since later server versions may not work with older backups, IMHO).

    3) After export delete all the rows where arcived = 1. It will still take a bit but if you have an off peak time it will not affect productivity as bad or at all if nothing goes on during off peak.

    3) Reindex DB, update stats and update usage to clean the DB. Then if you absolutely need the space back from the DB do truncate of log and shrink DB. Otherwise you may want to do a truncate of TL since DB growth does have am overhead which you should try to avoid.

    I will post any other options I can think of with DB speed in mind.

  • The Problem for your proposal is that I cannot (at least not yet!) add a additional column to each table for holding this status bit (data live or archived) because changes of database schema does not fit to VB application anymore (unfortunately).

    Thanks

    mipo

  • Ok then how about a secound table with the PK value from the primary table when archived. Call it of course archived and when the user places a value in archived it's PK value(s) are written to the archived table. If the value is in there it is archived and ready for the archival process if not then it is live and handled as such. Of course this means additional reads on the database to check this table but a clustered index should make it fast.

  • Thanks for that tip.

    Could you give me more details about how to do that or you have an example available?

    mipo

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply