Sizing a database can be one of the most arduous tasks a
DBA, analyst or developer must attend to. It’s time consuming and more hours
go into analyzing the database than actually sizing it. This article focuses on
how to monitor the database’s growth after its deployed and contains some tips
on how to size it before deployment. We will also dive a little into how to
benchmark your database against a robust data load.
So how much equipment do you need to buy for your database?
What you answer can mean the difference between a SAN deployment and a few RAID
1 drives. Luckily for you, there are some tools out there to efficiently size
the database before deployment. For example, you can use ERwin (a product of CA)
to determine how large your database will be in x number of months. You can also
load a database with random data using products like Datatect 1.6. Those who are
feeling brave can try to do the math yourself, but beware of the dreaded
indexes. When most people size a database without the aid of a 3rd
party product, they miss the space needed by indexes, foreign keys and
constraints.
The easiest way of sizing databases is through the data
model program itself, like ERwin. ERwin (as most data modeling programs) has a
volumetric tab that allows you to specify a number of volume items when you’re
creating the table. In ERwin, you can specify:
- The
number of initial records
- The
number of new records monthly
- The
change a column will be NULL
The combination of these three factors can give the product
the ability to generate accurate volumetric reports any number of months out.
These reports can be outputted to HTML and can even be viewed by object. Most
modeling tools account for indexes, foreign keys and constraints. The only thing
ERwin doesn’t account for is indexed views, which can take up quite a bit of
space if they’re against a large table.
Once you have a general ideal of how large your database is
going to be, you can prove your theory with a product like Datatect
1.6. Datatect allows you to create meaningful random data to load into your
database. It also takes into account foreign keys as it’s loading data. After
trying all the other data generators, Datatect 1.6 was the easiest way to load
valuable sample data into my database I found. I found the other data loading
applications to be unstable and would easily peg my CPU. I am finishing my
review of Datatect 1.6 next week and will have my full lab results complete
then.
So why would you want to load a sample database? A few
months ago I used ERwin to project a new product’s database size would grow to
about 50GB after 6 months of operation. The application developers were used to
developing against test databases with only a few hundred sample records in them
and couldn’t fathom how to develop against a database this size. After loading
the database using Datatect, they were able to find deficiencies in the
application that would’ve crashed the server within six months once the
application went to production.
With Datatect, I was able to load a 10 GB table in about 12
hours with meaningful test data. The same task in a similar product by a larger
vendor took over 8 days. Make sure that whenever you load the data, that you
remove all clustered indexes. Otherwise you can increase your load time
significantly. Once you have a sample database, you can use programs like
WinRunner, which can benchmark an applications performance, to see how it will
perform with so many records.
The final approach is to watch and learn. You can use this
when you want to size a server aggressively and then deploy it with your best
guess on the size. Once I deploy a server, I have a table on each server where I
log statistics about my databases. Each week, I capture a snapshot of how large
each database is and then I can pull it into a spreadsheet for later use. Once
in the spreadsheet, you can draw trend charts to project where your database
will be physically in one year.
A sample stored procedure to get you started is available in the Resources section below. In a later article, we will dive into the more advanced topic
of how to size your database yourself without the assistance of a third-party
tool