Hunting Down Illicit Users
One of the most frustrating jobs a DBA has to contend with is unpicking the problems present in legacy systems.
One of the thorny issues concerns security of the database. Imagine that you have legacy systems dependent on standard SQL security. By their nature the passwords to these standard logins are known to developers.
Ideally you would deprecate the use of these standard logins as soon as possible and no developer would have access to LIVE systems but the problem is that you have a large number of business critical tools that depend on standard security. There are three problems you face.
- It takes time to identify the legacy tools
- It takes time to alter these tools even though the change is (should be) trivial
- The business cannot see the benefit in work that won't improve functionality or increase profitability
The business will agree to an edict banning developers from accessing LIVE sites provided data can be made available on a "nearly live" basis.
The Problem
You fulfill your side of the bargain. Virtually all LIVE data is replicated down from the LIVE DB servers to DB servers explicitly set up to allow ad-hoc reporting. You would prefer to deprecate the standard logins but at least the business has agreed to support the principal of no LIVE access.
Being a good DBA you monitor your LIVE DB servers continuously. You notice that some of the LIVE servers are regularly struggling to the point where customer access (the bread and butter of your company) is threatened.
The servers seem to cope admirably during peak Internet hours even with heavy batch jobs running on them but seem strangely hot and bothered during office hours. You suspect that illicit use of the old legacy standard SQL logins is taking place but how do you prove it?
The Solution
As stated earlier the obvious solution would be to deprecate the standard logins but this isn't an option available to you.
What you are going to have to do is track down who is using these logins and for what.
Using sp_who2
The first port of call is to look at the sp_who2 system stored procedure. An example of the
results can be seen below.
SPID | Status | Login | HostName | BlkBy | DBName | Command | CPUTime | DiskIO | LastBatch | ProgramName | SPID | RequestID |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | BACKGROUND | SA | . | . | NULL | RESOURCE MONITOR | 0 | 0 | 11/18 11:22:24 | 1 | 0 | |
Sequential SPID info for SPID 2 through 57 | ||||||||||||
58 | Sleeping | LegacyLogin | UserMachine | . | LiveDB | Awaiting Command | 0 | 2 | 11/18 11:22:24 | Microsoft SQL Server Management Studio - Query | 58 | 0 |
There are 8 columns of interest to us here.
- SPID - Will allow us to identify the actual query being run via DBCC INPUTBUFFER
- Login - Is the SPID assigned to a legacy login?
- HostName - What was the machine performing the query
- DBName - The database against which the query was performed.
- CPUTime - The impact of the query in terms of CPU
- DiskIO - the impact of the query in terms of Disk activity
- ProgramName - The program used to trigger the query
- LastBatch - date/time at which the query was run.
In terms of the rows that are returned we are only interested in those that apply to our
legacy SQL Standard logins so we need to filter out the many SPIDs that we are not interested in.
What we need to do is investigate how sp_who2 actually works and to do this we use sp_helptext 'sp_who2' run in the master database.
This reveals that the procedure does little more than a simple SELECT against the master.dbo.sysprocesses table, bouncing the results into a temporary table and performing a little light formatting. This is good news as we can easily write a simple query to extract just those elements that we are interested in.
Our Select Query
We have already established what fields we need from our query, now we have to consider
a suitable WHERE clause. Our requirements are as follows:
- Identify the queries executed by SQL Standard Logins
- Identify activity on non-system databases
- Identify queries called from standard Microsoft applications.
We know that the tools that the illicit users have at their command are as follows
- SQL Management studio
- Enterprise Manager
- SQL Query Analyser
- Visual Studio
With the exception of Visual Studio the applications above have specific program_name
entries associated with them. With this information to hand our query will be as follows
SELECT SPID , DB_NAME(dbid) AS DBName, hostname , loginname , program_name , cpu , physical_io , last_batch FROM master.dbo.sysprocesses AS P WITH (NOLOCK) INNER JOIN master.dbo.syslogins AS L WITH (NOLOCK) ON P.sid = l.sid WHERE P.dbid>4 -- Exclude the standard system databases. AND ( -- Identify M$ standard database applications P.program_name LIKE 'Microsoft SQL%' OR P.program_name LIKE 'MS SQLEM%' OR P.program_name LIKE 'SQL Query Analyser%' ) -- The phrase below identifies SQL logins. AND L.IsNTName = 0 AND L.IsNTUser = 0 AND L.IsNTGroup = 0
Building a structure to hold results
I would recommend that on every server you have a specific DBA database that you use for monitoring
activity on your server. For the sake of argument let us imagine that you have a "DBA"
database.
Create a table to hold the results of your query as follows:
CREATE TABLE dbo.LogStandardSQLAccess ( SPID SMALLINT NOT NULL , DBName SYSNAME NOT NULL , hostname SYSNAME NOT NULL , loginname SYSNAME NOT NULL , program_name SYSNAME NOT NULL , cpu INT NOT NULL , diskio INT NOT NULL , LastBatch DATETIME NOT NULL, DateCreated DATETIME NOT NULL CONSTRAINT DEF_LogStandardSQLAccess_DateCreated DEFAULT(GETDATE()) , SQLQuery NVARCHAR(4000) NULL ) GO CREATE CLUSTERED INDEX idx_LogStandardSQLAccess ON dbo.LogStandardSQLAccess ( DateCreated , LastBatch,SPID)
Putting it all together
We want to take the results of our query and the results of the DBCC INPUTBUFFER for each SPID
and record them in our DBA.dbo.LogStandardSQLAccess table. To do this I would set up a job with the
full script shown below set to execute every minute.
SET NOCOUNT ON -- Table variable to hold the DBCC INPUTBUFFER command. DECLARE @InputBuffer TABLE( EventType VARCHAR(30) , Parameters INT , EventInfo NVARCHAR(4000) ) -- Table variable to hold results for our sp_who2 equivalent query DECLARE @UserActivity TABLE ( SPID SMALLINT NOT NULL , DBName SYSNAME NOT NULL , hostname SYSNAME NOT NULL , loginname SYSNAME NOT NULL , program_name SYSNAME NOT NULL , cpu INT NOT NULL , diskio INT NOT NULL , lastbatch DATETIME NOT NULL ) INSERT INTO @UserActivity SELECT SPID , DB_NAME(dbid) AS DBName, hostname , loginname , program_name , cpu , physical_io , last_batch FROM master.dbo.sysprocesses AS P WITH (NOLOCK) INNER JOIN master.dbo.syslogins AS L WITH (NOLOCK) ON P.sid = l.sid WHERE P.dbid>4 -- Exclude the standard system databases. AND ( P.program_name LIKE 'Microsoft SQL%' OR P.program_name LIKE 'MS SQLEM%' OR P.program_name LIKE 'SQL Query Analyser%' ) -- The phrase below identifies SQL logins. AND L.IsNTName = 0 AND L.IsNTUser = 0 AND L.IsNTGroup = 0 DECLARE @SQL VARCHAR(60) , -- Holds the DBCC INPUTBUFFER command @spidValue SMALLINT -- Used to loop through our @UserActivity table SET @spidvalue = 0 WHILE @spidValue IS NOT NULL BEGIN SELECT @spidvalue = MIN(SPID) FROM @UserActivity WHERE SPID > @spidvalue IF @spidValue IS NOT NULL BEGIN -- Flush out previous record. There will only ever be one record in this table DELETE FROM @InputBuffer SET @SQL='DBCC INPUTBUFFER(' +CAST(@spidValue AS VARCHAR(6)) + ') WITH NO_INFOMSGS' INSERT INTO @InputBuffer(EventType,Parameters,EventInfo) EXEC (@SQL) INSERT INTO dbo.LogStandardSQLAccess ( SPID , DBName , hostname , loginname , program_name , cpu , diskio , lastbatch , SQLQuery ) SELECT U.SPID , U.DBName , U.hostname , U.loginname , U.program_name , U.cpu , U.diskio , U.lastbatch, I.EventInfo FROM @UserActivity AS U CROSS JOIN @InputBuffer AS I WHERE U.SPID = @spidValue END END GO
One Further Step
Assuming that the script above offers proof that developers ARE abusing legacy logins then the next step is to identify who those developers actually are. As the majority of legitimate access is via Windows logins you can soon assemble a list of logins to machine names by querying master.dbo.sysprocesses on development and reporting DB Servers.
SELECT DISTINCT hostname,loginame FROM master.dbo.sysprocesses
The whole purpose of this is to find out who is running queries on LIVE boxes via
illicit use of legacy logins.
The code can identify the following information:
- The machine from which the illicit query is being run and therefore who is running it
- The actual query being run, or at least the first 4,000 characters of it.
What the script cannot identify is WHY the queries are being run.
- Is there a legitimate business need for absolute up to the last millisecond information?
- Has the information been omitted from the replication article to the reporting server?
- Does the developer know that they are not supposed to access LIVE servers? The fact that their
Windows account is denied access to LIVE servers should be a large hint on this front.
- Do the developers know about the impact of their illicit access.
All of the above could be explained (but not excused) by a breakdown in communication. It has been my experience
that if there is one common fault in business it the failure to communicate.
Conclusion
Ultimately there is only one true solution to the problem of legacy logins. Get rid of them. As long as they remain
on LIVE boxes they will be abused and no amount of management edicts will prevent it happening.
Lets face it, when Moses came down from Mount Sinai with ten simple instructions from God they were given lip service
then ignored.
Again, my experience is that the use of choice four letter words are the only solution.
- COST - "This is costing you money" works more magic than "Abracadabra".
- LOSS - "This is losing you sales" is useful with sales lead organisations, particularly when
bonuses are paid on sales.
If you are going to invoke these words of power then you need to get your facts and figures together.
Your job and your time will be costed and known to the finance department. Find out what that cost is and
calculate what your direct costs are associated with monitoring this problem. Don't be afraid to round up these
figures, you almost certainly spend more time than you think dealing with struggling servers as a result of this activity.
There may be statistics on the frequency of sales and if you are working for a web based operation there may be
statistics for incidents when a customer experienced a timeout on a critical page. This will give you a figure
for the potential loss of sales due to busy servers. Again, don't be afraid to round these figures up, after all
the same technique is used to set targets for the salesforce.
Remember, if you are in a growing company your problems will increase as the company grows so it pays not to understate the problem.
- More legitimate database access from growing sales.
- New developers being taught naughty tricks by old developers
- Increased volume of data for developers to query illicitly
- Increased DBA maintenance