I am working on a project that is very successful in terms of its market
share, i.e. product users are probably doubled in the last two years, and it is
forcasted to be trippled next year, and as a result, our production servers are
doubled already, and may be trippled next year. However, our DBA team is not
expanded at that rate, and no doubt, more servers and more business users means
more "baby-sitting" needed from DBA team to deal with maintaining,
trouble-shooting and security. We can still deal with it, but I know that when
the servers continue to grow in number, assuming DBA resource is fixed, at a
point of time, the team will be over-burnt and the our work efficiency and
quality will drop and then any problems can occur. This brings one question to
my mind, "How can we increase database administration scalability (DAS) without
adding more DBA manpower?"
I define DAS as the following (by borrowing / modifying from wikipedia at http://en.wikipedia.org/wiki/Scalability):
Database administration scalability is a property of database administration
practice, which indicates its capability to handle growing amount of work in a
graceful way without requiring any significant additional DBA resource.
In DBA term, the DAS means that no matter how many servers / instances I take
care of (of course in a reasonable range), the efforts I need to devote (in the
context of business requirements) should be flat or nearly flat, as shown in the
following table
Server Number | Efforts Unit |
0 | 0 |
10 | 10 |
50 | 10 |
100 | 11 |
150 | 11 |
How to address DAS is totally another topic as it depends on business
requirements / environments and need to consider things like system, auditing
requirement, security, performance and even budget (for example, some sql server
functions do not exist in a free SQL Server express editon).
My favorite approach to DAS is to centralize my administration work at one
place, and all adminstration scripts are put in one repository. When necessary,
the selective scripts will be distributed to other servers automatically for the
execution on the local servers, but most of the time, the execution of the
scripts are initiated from the central place where repository database
resides.
I'll take one example from my previous experience:
We DBA team was required to run a customized standard trace on each server we
manage, there can be many ways, we ended up by doing the following (simplified,
assuming each server has only one default instance)
1. In a central server, create a table, dbo.DBServer (ServerName varchar(100)
)
2. Populate the table with server names
3. Create a stored procdure that takes two parameters, usp_OperateTrace
@ServerName , @TraceOn (@TraceOn=1, turn on the trace, 0=turn off)
The stored proc mainly does two things: (1) create a dynamic trace creation
script (2) create a link server pointing to the server @ServerName (if it is not
there), and then run the dynamic trace creation script on the linked server,
(then drop the link server if it is not there before)
4. Run a cursor against the table dbo.DBServer, and call usp_OperateTrace for
each retrieved server name.
This is a simple example, but by designing our approach with scalability in
mind, this makes our life easier, when all servers get patched, SQL Server
services will usually restart, and for me to start all traces on all servers
again, I just run the job containing script of step 4 mentionedabove, and that's
it.
In my early years as a DBA, I seldom consiously consider scalability in my
adminstration practice / framework, but now I believe that making the code /
design as "born to be scalable" should be considered as a best practice by
all DBAs.