Archive older data

  • Hi all,

    I have a server(Sql server 2008 R2) with large amount of data (about 2TB). How can I divide my data to multiple servers ? and hold this state dynamically for example main server always have data from 6 month ago til now and older data move to archive server ?

    I would b glad to have your opinion or document resources about this solutions.

  • Hopefully the table and the Archive able are partitioned and have the same layouts, constraints, PKs, filegroupings etc. If this is the case, then you could swap out the oldest partition in the table and swap it into the archive table.

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Mike01 (5/11/2011)


    Hopefully the table and the Archive able are partitioned and have the same layouts, constraints, PKs, filegroupings etc. If this is the case, then you could swap out the oldest partition in the table and swap it into the archive table.

    tnx for your answer,

    Should i write query(use linked server) for transfer data between servers or Sql server has a proper tool to doing this tasks?

    in other words ?Any solution or tool for doing this exist?

    And I wanna know What solutions for the isolation and maintenance of new data and old in separated servers suggested in enterprise scale ? I would b glad to if u can introduce an article about this type of solutions.

    tnx

  • Have the business requirements for data retention and when data should be archived been clearly defined?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (5/11/2011)


    Have the business requirements for data retention and when data should be archived been clearly defined?

    Yes, I have 2 servers, one for live data and other one for archived data.

    live data are those data that their creation date is after specified period of time(for example 6 month) and archived data are older (archived) that is the main property is that they are read-only (no delete or update needed).

    I want to separate live data from those live data that are being in archive period and transfer to archive server monthly.

    The aim of this strategy is to improve performance, because SQL server (in live server) deal with small amount of data.

  • No one comments?

  • No one comments?

  • It isn't that simple - I've been doing this type of stuff for over 12 years with SQL Server, since 7.0.

    Here's a doc I wrote about partitioning - it was part of a presentation on issues concerning archiving and such

    http://www.grumpyolddba.co.uk/gonein60ns/GoneIN60nsWeb.pdf

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • From reading the original post, I'm not seeing that the poster is using partioning.

    If that is the case, I would suggest the following for a purely home-grown solution. I by no means intend this to be considered "best practice", just something I would consider doing.

    If I were doing this, and I have thought about it often, I would create a column in the table to be cleaned, and the new table on the other server, with the name of something like CleanStatus, being an integer, and a default value of 0.

    1)First, make sure all records have a 0 in them (UPDATE tablename SET CleanStatus=0)

    2)Identify any needed records to clean, and place a 1 in the CleanStatus column

    UPDATE tablename

    SET CleanStatus=1

    WHERE somedate > #months old --pseudocode

    3)Then move the records to the new server/database/table.

    INSERT INTO newtable

    SELECT * FROM sourceserver.owner.sourcetable WHERE CleanStatus=1

    Would use a Linked Server to pull this over.

    4)Confirm that the number moved is the number with a 1 on the CleanStatus on the source and destination tables to confirm that you

    brought over the same number of records.

    5) If so, can reset the new table CleanStatus=0.

    6) Delete source table records with the CleanStatus=1, since you have confirmed they have been moved to the other database.

    This way you only mess with the ones you have positively identified.

    Just my suggestion.

    Bob

  • First off, a 2Tb table should be partitioned, otherwise any type of management, querying etc... will be a nightmare.

    Adding a column, then updating it, moving it to another table and then deleting is a lot of touches of the same data. If the table isn't partitioned, then the updating alone will take a long time and alot of transaction log. Plus any issues that existing processes might run into because of the new column.

    Let's assume that the table isn't partitioned. I think the first step would be to figure out what the retention date is (i.e. retain 6 months, so any records older than today - 6 months get archived). Then copy these records to the new table and delete them from the existing. This process should probably be run daily (weekly at the latest), to process the minimum amount of records.

    If the table is partitioned (and I'm assuming it would be by date), we could switch the partition out to a "staging table" for a new partition. Then copy the data from the "staging table" to the Archive table. As a last step, we can drop\Truncate the "staging table"

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • There are a few shortcomings with partitioning not least of which is the issue of the partitioning column having to be part of the clustered key. This means if you have a PK ( say an int ) you cannot enforce uniqueness except by using a trigger or indexed view, because all unique keys must inlcude the partitioning column. You also can't use full text with a partitioned table and you can't rebuild the indexes on-line.

    He did actually say 2TB database. The real problem with archiving isn't the data moving it's the removing old data - and this is where partitioning is useful.

    If you have a table gaining say 4 million rows a day trying to delete those is a nightmare.

    All I'm saying is that although you're quite right that partitioning is the way to go, and yes big tables that are partitioned offer lots of advantages it's not always quite so clear cut as that.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • tnx a lot for your attentions 😀

  • 1) Federated databases, where you have data spread across different physical servers, was NEVER a fully functional and useful thing in SQL Server. If you need to be able to report or access all data on all machines that is what you need. But it isn't good solution.

    2) 2TB of data is NOT something that you NEED to scale across multiple servers using modern hardware. Design things correctly and get sufficient hardware and 2TB of data on SQL Server be no problem at all.

    3) Hire a consultant to help you a) figure out what you REALLY need and b) help you get there. You have little hope of operating at that scale without experience and training on how to get there from where you are.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/17/2011)


    1) Federated databases, where you have data spread across different physical servers, was NEVER a fully functional and useful thing in SQL Server. If you need to be able to report or access all data on all machines that is what you need. But it isn't good solution.

    2) 2TB of data is NOT something that you NEED to scale across multiple servers using modern hardware. Design things correctly and get sufficient hardware and 2TB of data on SQL Server be no problem at all.

    3) Hire a consultant to help you a) figure out what you REALLY need and b) help you get there. You have little hope of operating at that scale without experience and training on how to get there from where you are.

    Tnx for your answer,

    1) You r right, deal with different physical servers is so hard but I'm system developer (programmer) and handle this problem in my application by using different connection strings that known by data creation date

    2)I have some limitations here that cant have my favorite servers and hardware and I have to use items that is provided. I recently get this system, unfortunately this system have a bad database structure and design that cause this slowness in this scale.

    3)Its a good idea but I'm not the decision here ! I would really glad to learn what r archiving data solutions in world that I can use !

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply