Managing Historical Static Data

  • I have a challenge and three possible solutions that I can think of. I would like to get feedback from the forum members on what would be a better choice.

    Here is the Intro:

    We have a central retail sales database at our corporate office that keeps our sales information for a certain period. The data is loaded daily from the point of sales to this central database. Then we have couple of other databases that keep the cost of goods sold, current promotions and other matching information for these sales. We are required to keep the data online for three years and delete after that. We have just reached our three year time period since establishing these data retention standards, and we have noticed that our data at about 600 GB is much larger than anticipated. There are only a handful (~10) of very large tables in 50 GB and above range.

    Now the problem is with the maintenance and backups of data. In my current setup, the indexing jobs would take days. My backups are mostly backing up the same old data (remember that the data is never changed once loaded from the sales terminals) that could just have been backed up once in its 3 year lifetime. Of course, there is newly loaded data that needs to be backed up too, but that is a fraction of the entire database. The archiving process is another mess. Every week there is a job ( that often gets locked up with user queries) that deletes data older than 3 years.

    To address the issues, I have 3 possible solutions:

    1.Partition each of the database tables by time period and use file groups to separate the tables and for backups.

    2.Problem with this approach is that we do not do log backups and hence file group backups do not work. So, I ended up dividing the DB into active and old databases and keep the current quarter in current DB and the rest in the OLD database. Then each quarter transfer the table to the archive database and backup the database, once a quarter. The active database is backed up regularly, but since this now contains only a quarter worth of data as opposed to 3 years, it is more manageable and the tables are smaller too, so indexing is a piece of cake. For archiving I just drop the corresponding table.

    3.The problem with the above approach is that I need to transfer the tables every quarter between databases. This is not a big deal, but since I have to deal with several large tables, every quarter I have hefty transfer jobs running. To avoid this, I have decided to break databases by the time period. So, Sales would be broken into Sales_2001Q1, Sales_200Q2 etc. This way data would reside in the corresponding database and I do not have to transfer data. I still have all the benefits of manageability, archiving and backups etc. The only downside is that I would have more databases, but as far as I know there are no issues with having more databases on a server. Or, are there any?

    Now, I would like to ask you guys, does my third approach make sense? Am I being overly cautious towards the file group based backups? I do not feel a whole lot comfy with the file group based backups, since even if we do log backups, we are still required to do full backups to address any missed log backups. The need to have “all tran log” backups available for the file group backups to work makes me nervous. As such I am opting for my 3rd solution. Are there better means to address this? In the past you folks have always raised the bar with your innovative solutions, and always came up with a better solution than I thought of.

    Thanks for your time.

  • I used approach #3 on a smaller scale by creating fiscal year reporting tables of a much larger table. It allowed our OLAP/DSS users to query/report against this data with relatively quick response/run times. Segregating the databases due to the backup/restore reasons you have provided would push me to vote for solution #3.

  • Definitly the solution #3.

    But if you want this to make transparent you could implement a partitioned view based on the year/quartal.

    So you can put the "archive" data into differents databases or even on different servers (distributed partitioned view). This is especially useful if you are not querying the archive data very frequently

    Bye

    Gabor



    Bye
    Gabor

  • I agree with Gabor (Nyulg). Becuase of your backup issues I would go with option #3 and build a distributed partitioned view across the databases, to allow users access to all three years worth of data.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply