Note: This is an in-depth article that exceeds 5,000 words, and provides a case-study of how a maintenance plan could be implemented on a single, plain vanilla, SQL Server instance.
As I have mentioned in previous parts of this article series (part 1, part 2), I act as a consultant DBA for the SQLServerCentral.com (SSC) and Simple-Talk.com (ST) backend databases, and I don’t perform any of the day-to-day tasks. But one of the things I was asked to do was to implement a database maintenance plan for the new, two-node, active/active cluster that has recently gone live. In this third part of this series, I am going to talk about how I created the maintenance plan, and why I made the choices I did.
To recap some of what I have said in the previous parts of this series, node 1 of the active/active node cluster is dedicated to SSC, while the second node is dedicated to ST. Currently, only the SSC node is in production, while the ST node has yet to go into production. Because of this, the focus of this article is on the maintenance plan for the SSC node.
Before the new cluster was installed, the eight databases that are used to run SSC ran on two different servers, splitting the load between them. This of course meant that there were two SQL Server instances to perform database maintenance on. Backups were handled by Red Gate Software’s SQL Backup Pro, and other database maintenance was done using a Maintenance Plan created with the Maintenance Plan Wizard. Today, with the new server, all eight databases are running on a single node, which means only one database maintenance plan for the SSC databases is needed.
Like the older servers, the new SSC server is being backed up using SQL Backup Pro. This is handled by Red Gate’s IT department and I don’t get involved in it. I do know they make full backups every night and transaction log backups every 30 minutes. Since I don’t have to worry about backups as part of my maintenance plan, all I had to focus on was database integrity, index maintenance, and a few smaller maintenance tasks.
As I thought about how to best create this maintenance plan, here was what I considered:
- Although SSC is accessed by DBAs throughout the world 24 hours a day, the SSC databases have a specific pattern of activity. I saw this with the two older servers, but I wasn’t sure if this same pattern would still exist when all of the databases (and their various jobs) were located on a single server. This meant that I would need to reevaluate when the new server was at its lowest point of activity of the day, which would help me determine the optimal maintenance window so that the maintenance job affected as few users as possible when it ran.
- As a corollary of the above, I wanted to ensure that my maintenance job didn’t run at the same time as other jobs that run on the server. The SSC databases have a lot of scheduled jobs, some of which are quite resource intensive, and I didn’t want my maintenance job to interfere with any other jobs, causing any potential resource conflicts.
- On the old servers, which used a Maintenance Plan created with the Maintenance Plan Wizard, indexes were maintained by rebuilding every index, in every database, every day. This was very resource intensive, especially given that fact that many of the tables have little activity (such as those tables that contain articles and forum posts), and would not benefit from a nightly index rebuild. So my goal for the new maintenance plan was to use a T-SQL script that would review the index fragmentation of each index, and then to either rebuild or reorganize it, depending on its level of fragmentation.
- On the old servers, database integrity checks (DBCC CHECKDB) were only run once a week, because the servers were already bottlenecked, and trying to run integrity checks every day would not have been practical. On the new server, my goal, assuming that the new hardware was up to the task, was to run integrity checks every day. My personal philosophy is to run integrity checks daily, if at all possible, as the sooner you discover problems, the sooner you can fix them. Before I could make this decision, I would need more information on the hardware’s performance to see if it was even feasible to perform this task within the available maintenance window.
- Because database maintenance is such a common task, and the SSC databases didn’t really have any unique needs, I didn’t want to reinvent the wheel by creating a custom maintenance plan using T-SQL or PowerShell. So my goal was to use a publicly available maintenance script. There are many excellent maintenance scripts available on the Internet, and all I had to do was choose one and implement it according to the server’s needs.
- Since I am not the day-to-day production DBA of the SSC databases, I needed to keep the maintenance plan as simple as possible, and at the same time, document it as much as possible so that the IT staff at Red Gate could easily understand what I did.
The above list doesn’t include everything I considered before putting together the SSC database maintenance plan, but it does cover the key points.
Note: The SSC databases are running under SQL Server 2008 Standard Edition, SP1. If you decide to imitate my maintenance plan for your own SQL Server instances, keep in mind that all the choices I made are based on this version of SQL Server, and may or may not apply to other editions or versions of SQL Server.
Identifying Which Maintenance Tasks to Perform
After taking into consideration my general goals for this maintenance plan, my first step was to determine exactly what database maintenance I wanted to perform on the SSC databases. I ended up deciding to perform these maintenance tasks:
- Run integrity checks on all system and production databases on a daily basis using the DBCC CHECKDB command. While my goal was to perform this task daily, as I mentioned earlier, I wasn’t sure if this was practical or not, as this was a new server, and the number of databases on the server had increased. The only way to really find out if this was practical or not was to try it. After some experimentation, I discovered that it was indeed practical to perform this task daily. With the new hardware, performing an integrity check now only takes about 10 minutes for all system and production databases, which is well within my maintenance window. In addition, the new server no longer has any CPU or I/O bottlenecks, so running the resource intensive DBCC CHECKDB command has little or no affect on people who access SSC during the brief time while this maintenance task is running.
- To either rebuild, or to reorganize, all tables in all production databases that have become significantly fragmented, on a daily basis. The topic of index maintenance is a large one, and a topic that I don’t want to spend a lot of time on here. Based on my past experience with index maintenance, here’s what I decided to do: If an index has less than 5% fragmentation, or if the total size of the index is less than 100 pages, leave it alone. If index fragmentation is 5% or more, but less than 20%, the run ALTER INDEX REORGANIZE and UPDATE STATISTICS on it. If index fragmentation is greater than 20%, the run ALTER INDEX REBUILD on it.There isn’t any magic number to use when deciding when you should or should not rebuild or reorganize an index. BOL (Books Online) makes a slightly different recommendation than what I have chosen, but the recommendation in BOL is only a very general recommendation, and you should choose options that work best for your databases. I experimented with these numbers, testing several different combinations until I found the one above, which seemed to be a good balance between index optimization and the amount of time and resources it takes to maintain them. Given that users access the SSC databases 24/7, you might ask why I choose to use ALTER INDEX REBUILD at all, given that this command will lock a table while it is working, preventing access until it is done? I made this choice because it is more effective than ALTER INDEX REORGANIZE on large, heavily fragmented tables, and because only a few tables are affected each day, and I only run the command during the least busy time of the day. It is possible that a few users might be negatively affected by this, but it won’t be by much. It would be better if we had the Enterprise Edition of SQL Server and could perform an online rebuild, but since we don’t, this is a compromise I am willing to make. I’ll watch over the database’s performance over time, and depending on what I see, I might tweak some of these options if needed. This particular job only takes 2-3 minutes every day.
- Delete older job and backup history from the msdb database on a regular basis. Over time, the MSDB database can grow very large if older job and backup history information is not removed. Typically, I schedule a job to run once a week to remove older entries (older than 60 days) using the sp_delete_backuphistory and the sp_purge_jobhistory commands. If I did this on the SSC server, then I would have to create two separate jobs: a daily job for database integrity checks and index optimization, and a weekly job to remove the old job and backup history. For this particular server, I made a different choice. Because other IT staff would be performing the day-to-day management of the databases, I decided to keep things very simple and create a single job that performs all of these tasks on a daily basis. While it isn’t necessary to run the sp_delete_backuphistory and the sp_purge_jobhistory commands daily, there really isn’t any downside to doing this on this server, as running both commands takes less than one second each day.
Now that I had decided what I wanted to do, the next step was to implement it, and before I could do that, I needed to decide on what scripts to use to perform the database maintenance tasks.
Note: I could have easily added database and log backup to my maintenance plan, but I did not have to because this was being handled separated by Red Gate’s IT staff using SQL Backup Pro.
Deciding Which Maintenance Scripts to Use
As I mentioned earlier, because the database maintenance I needed to perform on the SSC databases is very vanilla, I didn’t see much point in creating a custom script to perform it. Many DBAs have written scripts to perform the above tasks, and have shared them with the SQL Server community. Since I have reviewed many of these scripts over the years, I knew right away which one I wanted to use, and that is the free T-SQL database maintenance scripts written by Ola Hallengren, which are available from ola.hallengren.com.
Ola offers several different scripts, that when run, create stored procedures in the master database, which when used with the appropriate parameters, can perform a wide variety of database maintenance.
Ola’s main database maintenance script (MaintenanceSolution.sql), when it is run, will automatically create all the necessary stored procedures (and one function), along with all the jobs you need to run them. All you have to do is to schedule the jobs, and you are off and running, assuming that you want to accept all of Ola’s default settings and jobs.
Personally, I don’t use Ola’s MaintenanceSolution.sql script. Instead, I use his separate scripts, which he provides if you only want to pick and choose what database maintenance stored procedures you want to use. These separate scripts are named:
- DatabaseBackup.sql
- DatabaseIntegrityCheck.sql
- IndexOptimize.sql
In this particular case, I decided to use Ola’s DatabaseIntegrityCheck.sql and IndexOptimize.sql scripts, as I was not concerned about backups in my maintenance plan. Another reason I used his individual scripts was that I like to make a few changes to them, which is easy to do, as Ola has made the scripts easy to modify. And the last reason I used these individual scripts is that I prefer to create my own jobs, instead of using the ones that are automatically created if you use Ola’s MaintenanceSolution.sql script. These are just personal choices, as you can run Ola’s scripts just as they are, and they will meet the needs of many DBAs without any changes.
One thing that Ola’s scripts do not perform is the removal of old job and backup history from the msdb databases. In this case, I had to write my own, very simple T-SQL script, to accomplish this task.
In the next section, I will explain in more detail how I created the SSC database maintenance plan using Ola’s two stored procedures, and my simple T-SQL script.
Creating the Maintenance Jobs
Given everything I have talked about so far, essentially, I needed to create a SQL Server Agent Job that had three steps, which include:
- Step #1: Run Ola’s DatabaseIntegrityCheck stored procedure
- Step #2: Run Ola’s IndexOptimize stored procedure
- Step #3: Run my T-SQL code to remove old job and backup data from the MSDB database
Let’s look at each of these, one step at a time.
Note: I’m not going to try and duplicate Ola’s documentation in this article. If you have specific questions about how Ola’ scripts work, please visit his website and read his documentation.
Job Step #1
My first step was to run Ola’s DatabaseIntegrityCheck.sql script. But before I did, I checked to see what default settings he used to perform the database integrity checks. The available options are show below:
You can accept the default settings you see above, or you can make changes if you like. If you want to make any changes, you will need to make them to the original T-SQL script (DatabaseIntegrityCheck.sql) before you create the stored procedure. After reviewing the above options, I decided to accept Ola’s default settings, so I didn’t have to alter the script. I just ran it, and the DatabaseIntegrityCheck stored procedure was created in the master database.
My next step was to create a new SQL Server Agent Job, and add the above stored procedure as its first step, which you see below.
Notice, that when I run this step as part of a larger, single job, that it is run as a CmdExec job type, and that the stored procedure actually runs from SQLCMD. Since you can’t see all of the code above, here is what it looks like:
sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q “EXECUTE [dbo].[DatabaseIntegrityCheck] @Databases = ‘ALL_DATABASES’” –b
While I’m not going to discuss how to use SQLCMD, what I do want to point out is the parameter that is being passed to the DatabaseIntegrityCheck stored procedure above. In this case, it is “All_DATABASES”. All of Ola’s stored procedures allow you to specify what databases they are to run against. I have chosen to run this stored procedure against all databases on the server (system and user). Other choices that are available include those listed below.
Another key point about this job step is the logging, which I want to show you. If you look at the “Advanced” screen of the “Job Step Properties” screen, you see the following:
Note above the “Output file”, which looks like this in its entirety:
S:\Program Files\Microsoft SQL Server\MSSQL10.INS1\MSSQL\Log\DatabaseIntegrityCheck_$(ESCAPE_SQUOTE(JOBID))_$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM)).txt
The above path and filename will automatically create a log file with a unique name, so that you can easily identify the job log, and open it up in order to see exactly what it does. For example, below is a partial example of what the job log looks like:
The above output is created by Ola’s stored procedure, and is great for seeing exactly what is going on after the job is executed.
Job Step #2
My second job step was to run Ola’s IndexOptimize stored procedure. While Ola’s DatabaseIntegrityCheck stored procedure has only a few option, the IndexOptimize stored procedure has a large numbers of options to consider. Since there are so many options, I am going to first show you the options are, then I will show you what options I selected, and talk a little about why I chose the options I did. For a detailed explanation of what each option does, visit Ola’s documentation page.
The screen above will probably look a little confusing at first, as there is a lot of information to take in. The first column describes different levels of index fragmentation. For example, FragmentationHigh_LOB refers to indexes that have a high level of fragmentation and also include LOB objects. If you look at each of the options in the first column, it should become self-evident; quite quickly, what each option means. On the other hand, what exactly does High, Medium, and Low mean? I will save a discussion on what these mean a little later. For now, all you need to know is that they refer to levels of index fragmentation that you can configure.
The second column is a brief description, and should be self-explanatory.
On the other hand, the last column is not as self-explanatory. In each of the boxes of the third column is a list of possible actions that can potentially occur for a level of fragmentation that exists. For example, the first row, which is for FragmentationHigh_LOB, has five different possible actions listed in the third column. What this means is that if the FragmentationHigh_LOB level of fragmentation is identified for a particular index, that you can choose to have one of five potential actions to happen to the index. The default action is “INDEX_REBUILD_OFFLINE”. If nothing is changed, this means that when an index is identified as having high fragmentation, and it also includes LOB objects, then the “INDEX_REBUILD_OFFLINE” action will take place on the index. As you can probably deduce, this action causes the index to be rebuilt using the ALTER INDEX REBUILD command (offline, as opposed to the online REBUILD option available with the Enterprise Edition of SQL Server). If you don’t want the “INDEX_REBUILD_OFFLINE” action to occur when high fragmentation on an index with LOB objects in it is identified, then you can choose from the remaining four choices. See Ola’s documentation for a more detailed information on what each action does.
Besides the above available actions, there are additional settings you can leave at their default value, or change to meet your database’s needs. See the chart below.
Except for the first two options above, most of the other options should be familiar to experienced DBAs. So let’s briefly discuss what the first two options do. Earlier, I mentioned that index fragmentation is divided into High, Medium, and Low in the IndexOptimize stored procedure. If you like, you can use the top two setting to determine what High, Medium, and Low mean. For example, the FragmentationLevel1 option above is used to set the Low level of fragmentation. By default, if the fragmentation of an index is below 5%, then it is considered to have a Low fragmentation level. The FragmentationLevel2 option is used to set the High level of fragmentation. By default, if the fragmentation of an index is greater than 30%, then it is considered to have a High fragmentation level. If the fragmentation falls between these two numbers (5% and 30%), then the fragmentation level is considered to be Medium. You can use the default values as established by Ola, or you can change the settings to meet your needs. His settings are based on general recommendations as published in BOL.
To review, the High, Medium, and Low levels refers to different levels of index fragmentation, and Ola’s script uses this to determine what action to take on the index, based on the index’s level of fragmentation.
So what options did I choose? Below you will see a snippet of code from the IndexOptimize stored procedure. This is where you can modify the default setting, if you like.
Towards the top part of the code, I specify what action I want to occur at different levels of fragmentation. For example, if an index is identified by the code as “FragmentationHigh_LOB”, I want the “INDEX_REBUILD_OFFLINE” action to be performed, and so on.
After deciding what actions you want to be performed on the different levels of fragmentation, you have the opportunity to determine the FragmentationLevel1 and FragmentationLevel 2 values, which determine the boundaries of High, Medium, and Low Fragmentation. For the SSC databases, I have changed the FragmentationLevel2 from its default value of 30% to 20%. I did this because I wanted to lower the point where an ALTER INDEX REBUILD occurs instead of an ALTER INDEX REORGANIZE. Based on my settings, if an index has less than 5% fragmentation, nothing is done to the index. If the index is between 5% and 20% fragmented, then the ALTER INDEX REORGANIZE and the UPDATE STATISTICS commands are executed. And if index fragmentation is above 20%, then the ALTER INDEX REBUILD command is executed on the index.
Another interesting setting is the “PageCountlevel”. What this means is if the number of pages used for the index is below the assigned value, that the index will be ignored. This is done because removing fragmentation on smaller indexes is rarely beneficial. By not defragmenting smaller indexes, fewer indexes need defragmenting, reducing the time it takes to run the job, and helping to save server resources. The default value is 1000 pages, but I have changed this to 100 for my job. The default value of 1000 comes from an old Microsoft whitepaper, and is a commonly accepted value. On the other hand, I feel that 100 is a better value, based on my own experienced, and from talking from other DBAs I trust. What value is best for you? You will have to make that decision yourself. By choosing the smaller value of 100, instead of the default value of 1000, this means my maintenance script will have to work a little harder. Based on some experimentation I did on the SSC databases using different “PageCountLevels”, I believe this is the sweet spot for this server. It may or may not be for your server.
While there are a lot more options available, I only want to point out one other, and that is the “SortInTempdb” setting. The default value is no, but I have turned it yes. Why? Because by moving much of the work of rebuilding indexes onto tempdb, this can help to reduce the load on your server’s I/O system, provided that your tempdb database is located on its own dedicated spindles, which is true for the SSC server. If your tempdb is located with all of your other databases, then I would recommend that you not turn this option on, as it probably won’t provide you with any performance advantages.
Once I made all of my changes to the original script, I ran it and the IndexOptimize stored procedure was created in the master database, and now I was ready to add it as step #2 of my SQL Server Agent Job, which you see below.
As with the previous stored procedure, this one will be executed using the SQLCMD command, like this:
sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q “EXECUTE [dbo].[IndexOptimize] @Databases = ‘USER_DATABASES’” –b
When I created step #1 of the job to check for database integrity, I specified that “ALL_DATABASES” be checked. In step #2 of this job, where I am optimizing my indexes, I am only performing this step on user database, not system databases. This is because system databases rarely will benefit from any index optimization. Because of this, the above command uses the “USER_DATABASES” parameter of the IndexOptimize stored procedure.
Similar to step #1, step #2’s “Advanced” tab of the “Job Step Properties” screen is almost identical. The only change is the output file, which has been slightly modified to indicate that this report will be for the IndexOptimize step.
S:\Program Files\Microsoft SQL Server\MSSQL10.INS1\MSSQL\Log\IndexOptimize_$(ESCAPE_SQUOTE(JOBID))_$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM)).txt
The above path and filename will automatically create a log file with a unique name, so that you can easily identify and view the job log. For example, below is a partial example of what the job log looks like:
The above output is created by Ola’s stored procedure, and is great for seeing exactly what is going on after the job is executed.
Job Step #3
The last step in my SQL Server Agent Job is to remove the older job and backup history from the msdb database. As I mentioned earlier, Ola’s script doesn’t have this feature, so wrote a very simple script to perform this simple task. Below shows the screen for job step #3.
Unlike the first two job steps, this step uses the T-SQL job type and runs several lines of code against the msdb database. All the script does is to subtract 60 days from today’s date, and then uses this date as the cutoff date to remove older job and backup history using the sp_delete_backuphistory and the sp_purge_jobhistory system stored procedures.
Similar to previous two steps “Advanced” tab of the “Job Step Properties” screen, I have specified that the output of this command be written to a file. The only difference in the code is that I have specified that this is for the DeleteHistory step.
S:\Program Files\Microsoft SQL Server\MSSQL10.INS1\MSSQL\Log\DeleteHistory_$(ESCAPE_SQUOTE(JOBID))_$(ESCAPE_SQUOTE(STEPID))_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM)).txt
The output shows, if any, how many history entries were deleted.
The SQL Server Agent Job is Done
Once all of the steps have been to the job, it looks like this.
As with any job, I also specified the actions to be taken should each step fail or succeed, and to set up notifications. Of course, after creating the job, I tested it thoroughly to ensure that everything was working as I expected. As I performed my testing, I also took note of how long the entire job ran, as I needed to know this for the next step.
Deciding When to Run the Maintenance Jobs
Now that the job was done, I needed to schedule the job. The first step was to find out how long the job ran. After running the job several times, I found that the job typically took from 10-15 minutes to run. This means that I only needed a 15 minute daily maintenance window to run this job every day. Of course, I made the assumption that it was possible that some days might take longer than 15 minutes, and took this into consideration when determining when to run the job.
As I mentioned earlier in this article, my goal was to identify the least busy time of the day to run the job, and at the same time, ensure that my job did not interfere with any other jobs. To do this, I used the SQL Server 2008 Performance Data Collector.
As many of you may know, the Performance Data Collector has the ability to collect performance data on SQL Server 2008 instances, and store them in a database for later reporting. I used this feature in order to determine the optimal maintenance window in order to run the maintenance job. Of course, before I could collect any useful data, the SSC server had to be in production. On the other hand, I really needed to perform database maintenance on a daily basis, even before I had an opportunity to collect any historical performance data from the Performance Data Collector. So, once the new SSC SQL Server instance went into production, I had to make an educated guess as to when to run the maintenance plan, and then wait several weeks before I felt I had enough data to be able to identify the lowest point of activity during the day, which would be the ideal time to run the maintenance plan with the least amount of impact on users of the website.
After reviewing server activity on the SSC SQL Server instance for several weeks, I was able to determine that the least busiest times of the day fell consistently between 5:00 PM and 7:00 PM Mountain Standard Time (MST). MST is used for the SSC server because Steve Jones, editor for SSC, lives in this time zone, although the physical server is located in London. So now, I only had to select a time period within this time range where the approximately 15 minute long job could run.
Note: I also compared the activity on the SSC server to the Google Analytics data, which confirmed that this time range had the lowest number of page views of the day, based on worldwide activity.
I ended up selecting 6:35 PM MST as the time to run the job, for several reasons. First, 6:35 PM MST is 5:30 PM PST, which means that most of the people in North America (SSC’s biggest audience) would most likely be home, and less likely to be accessing the SSC website. Second, there were no other jobs on the SSC server running between 6:31 PM MST and 6:59 PM MST, which means my job would not interfere with any other jobs. And third, this would give the typically 15 minute job a cushion of 10 minutes should it happen to run longer than usual, without interfering with a job that starts at 7:00 PM MST.
Checking How the Job Affects the Server’s Performance
Once I scheduled the job to begin running at 6:35 PM MST, I also spent a few days, watching to the performance of the job using the Performance Data Collector, just to see if it created any potential problems as it ran. After checking for several days after running the maintenance job at this time, there was a spike in CPU and disk I/O, as to be expected, but not enough to cause any bottlenecks. About the only time a user might experience any slowdown when accessing SSC would be during the 2-3 minutes it takes to optimize the indexes, and only if an index should be rebuilt, and not reorganized. Of course, I will be checking these jobs weekly to see if there is any need for changes to the plan.
As you might well imagine, this is not all of the database maintenance the SSC databases needs, but it does cover some of the key daily tasks that can easily be automated. In my next installment on my series on the new SSC server, I will talk about some of the key configuration settings I chose for the server, and why.