What to consider when scaling up from 20 GB to 2 TB?

  • Hello,

    We are currently using MSSQL 2000 Enterprise Edition. Our current database size is ~ 20 GB. Our data is mostly news articles, which we keep in the database for x number of days [depending on the purge frequency per news site]. Due to purging, our data size stays pretty much static. Recently there has been a talk about scaling up - a lot - up to 100 times. This resulted in discussions of whether we should upgrade memory [currently at 7.5 GB], SQL Server [to 2008] or move to some NoSQL solution [e.g. MongoDB].

    As a start up, we cannot afford 2008 licenses and MongoDB [or any other NoSQL solution] is currently not a well known territory for us, so we decided to stick with MSSQL 2000 as long as we possibly can.

    This question goes to those who deal with large [1-2TB] databases on a daily basis. I am guessing a number of issues will crop up as the database size grows [I'm sure there are more, so please add]:

    - Query performance

    - Database jobs taking much much longer [backups, optimization, index defragmentation/index rebuild].

    For the query performance, we are using stored procedures only and they are optimized to use indexes only, so I am hoping they will not be affected too much, as our clients expect sub-second response time. As to other database jobs, I am guessing as data size grows, they will take much longer.

    Can someone dealing with large databases advise on what to look out for - what to change/prepare to be ready for 1-2 TB database size?

    Thanks a lot!

  • With increase in the database size, all the house-keeping jobs will run longer. You may want to schedule a differential backup on daily basis, since the FULL database backup will take longer time to complete. Similarly the Rebuild Index job and CHECKDB jobs will also run for longer durations.

    Pradeep Adiga
    Blog: sqldbadiaries.com
    Twitter: @pradeepadiga

  • I'd think your SQL caches will need more RAM to work with. At 20GB, you're probably getting and keeping most of your most active data pages in the SQL data cache. Going up to the TB level implies that your cluster of frequently used" data pages is bigger, needing more data cache available.

    I'm guessing that the DB is growing because the business is expanding in some way, and your query load would increase as well. This implies:

    1. More data means larger stats to keep current (along with index maintenance) -- a bad plan because of out-of-date stats obviously hurts more with more data.

    2. If you're gaining more diversity of queries, the SQL plan cache will benefit from more RAM to work with.

    3. Review your DMVs now -- look for missing indexes, look for tables that are getting scanned. Queries that used to work passably at the GB level will hurt you at the TB level. An acceptable table scan at 20GB becomes an upset user at 2TB.

    If you can get the RAM in advance now, great. If not, then watch your cache hit ratios and use them to prove to management that you need budget.

    If you're not doing DIF backups, start now and get your process figured out. Restoring 20GB with FULL and Tlog is one thing, trying that with2 TB of data is a much longer restore window, and DIFs can help you reduce that time window by taking some of the Tlog chain out of restore process.

  • Keep in mind that your dev/QA environments will need to change as well. Pulling a copy of production back to check things can be much more difficult.

    You'll also likely not want to do gross rebuilds of all indexes, statistics, etc. anymore. It's too cumbersome. You will want "smarter" maintenance operations.

  • If you don't already have it and can get the budget for it, consider a backup solution can do table-level restores. With a large database, it's valuable to not have to restore the entire DB to see or recover data from one table.

    (I've only used Litespeed and it's worked very well, I can't speak for any of the other backup options). If you want more info on your options, there's been many questions asked and answered about this, just query the forums.

  • Both Idera, and Red Gate (I work for Red Gate), have options to help with recovery with their Virtual Database and Virtual Restore (respectively) products.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply