A baseline is a point of reference from which change can be measured. Baselines are often used in the medical world. Before a doctor prescribes new medication to a patient, the doctor will take the patient’s blood pressure and resting heart rate, capture their weight and perhaps prescribe some blood work. After a certain amount of time has passed, the doctor will re-capture the exact same data and check to see what metrics have changed, to fully understand the effects of the medication.
This same practice exists in the IT world, and can be used by DBAs to measure the effects of both planned and unplanned change. In a best case, this data can be used to quickly identify those unplanned changes which create performance problems. At the very least, this information can be used to understand problems in the current configuration and make plans for the future.
Baselines are a good idea. Every DBA that I’ve spoken with seems to understand the value: So why is it that so many companies do not have baseline data? I think there are two factors at play. The first is the startup cost, and by cost I really mean time and effort. If you’re going to roll your own method for capturing baselines, you have to put some thought into it, answering the questions:
- What data are you going to capture?
- When are you going to capture data?
- What methods will you use to get the data?
- Where are you going to store it?
- How are you going to report on it?
- How are you going to manage the data long term?
These questions require a fair bit of thought and strategy, and that can be overwhelming… to the point where one seems willing to live without the data, rather than go through the effort to get it.
The second factor is our propensity to be reactive. I hear it almost every time that I talk to a client. I will ask, “Are you capturing baselines?” and the person will say, sometimes with a sigh, “No…” The reasons vary. Some clients say they just have too much work, with too many deadlines and projects. Other clients say that they are always fighting fires and responding to emergencies whenever they come up. Both groups state that they cannot make time to implement something new; they simply don’t have the bandwidth to put baseline methods in place. Others just acknowledge that they are reactive, meaning that, at some level, they have decided that the amount of effort is more than they want to expend. These DBAs have decided (consciously or not) that they are willing to deal with the consequences of not having baseline data.
The Why
This is where it gets tricky. You recognize the value of baseline data; but your company is, sadly, not in a position to purchase a third-party tool to capture that information for you, so you’re going to have to do it manually. Where do you start? Do you have time to do this? How much is going to help you anyway? I mean, you’ve been able to solve problems before, right, without baseline data? Do you really need it? And, that’s the key: do you really need it? I cannot answer that, but I’m pretty sure that in every instance, that data does prove useful. Can you think of time where you did have baseline information, and had a problem, and the baseline data was of no value whatsoever? Conversely, think about problems you’ve had to solve where you had no data from which to work. How much time would you save if you had that information? How much value could that information provide to your company, to your team, to your colleagues, if it was available to use when problems came up?
The truth is that baseline data will make your life easier in the long run, and here’s why:
- You can find out what’s changed, before it becomes a problem
- Troubleshooting is easier
- You will be able to proactively tune the system for performance
- It allows you to understand trends that occur in your environment and your data
- You can provide better information for capacity planning
What to Capture
When I talk to clients about capturing baseline data, my first recommendation is to start small. There is so much data available. Many of us that work with SQL Server love data, but the amount that you can capture may leave you engulfed by numbers and without any real information. Therefore, you have to focus on what data is most important; the facts that can help you the most when a performance issue comes up. The question of what is most important could be debated for hours and ultimately it is up to each DBA to know his/her database servers and determine what is most relevant. Based on the systems I have seen, and the DBAs with whom I have worked, I suggest that you start with the following:
- Basic Information
- System Usage
- File and Database Sizing Information
- Wait Statistics
We start with basic information about a SQL Server, because those settings do get changed, particularly in IT shops with several DBAs, or where many individuals have access to the server environment. For example, I have seen problems caused by someone changing maximum memory allocated and max degree of parallelism, without the DBA knowing about it. Virtualisation can cause even more problems because changes in configuration can occur, unbeknownst to the DBA, if a VM is moved to another host.
If you know the baseline patterns for basic performance monitor counters (e.g. CPU, memory and disk latency), database connections and activity, then these items can be quickly checked when performance suddenly tanks. By being able to retrieve data quickly via a stored script, a DBA can immediately check current performance against “normal” performance from the baseline. Finally, for DBAs that manage numerous instances, this information can be useful when talking to management about trends in system usage, installation outliers and budgeting.
Once you know the basic settings and system usage, then you can move on to capturing baselines for File and Database growth. One of the main jobs of a DBA is to keep the system available, so it is important to avoid the quickest way to take down a database, which is by running out of space. This is best done by capturing current file and drive sizes, and calculating data-growth trends: by doing so, it also becomes easier to plan for requests for additional space or data migrations. If you track the growth in table sizes in your busiest databases, you can then relay this information back to business to help them so as to understand how applications are utilized.
Finally, in addition to understanding the normal values for CPU, I/O and memory, it’s important to look at a database server’s wait statistics. Every SQL Server instance will have waits, and while this information can be used for tuning, it’s essential to know the normal waits so they can be used for comparison when needed.
When to Capture
Once you have decided what data you want, you must decide how often you will take samples. Too much data and you might be asking your storage administrator for your own LUN on the SAN, too little and you won’t have a good understanding of what’s really going on with your databases. Frequency ultimately depends on the data itself: performance counters could be sampled every 15 seconds, but you wouldn’t capture transaction log file size at the same interval.
In addition, you need to establish whether you want to look at data that represents your busiest time period, normal business hours, or everything. You also have to determine how long you will keep the data. Is it necessary to keep data for six months, or one year? Perhaps three months of data is sufficient. The variations are limitless, and only you know what represents your business and what you can manage. Again, start simple. It is relatively straight-forward to capture data only during business hours and keep only the last three months' metrics for comparison; from there you can establish patterns and expand.
Where to Store the Data
In my experience, it is best to create a separate database in which to store all your baseline information, which is located on a production instance. You will need a share on the server to which data from all production servers can be copied and then stored in the baseline database. This database, and any scripts used to capture, migrate, store, and view data should be backed up regularly; the information can become essential to your daily job.
Your Next Step
This article in the first in a series of four that are related to capturing baseline data. The next three articles will step through the process for obtaining the sets of data defined above, and will include the scripts to do so. Your first task is to find an instance where you can host a baseline database and create the database. Your second task is to commit to capturing this data. It’s time to act, not react. Who’s with me?