The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.
Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.
Having read the previous article, you already know one method to try and find these database offline anomalies. That method may not be the most sleek solution nor most reliable given the possibility that events can quickly roll out of your default trace files. The better more reliable method is use Extended Events (XE) to monitor explicitly for those types of events. If you are unfamiliar with XE, I really encourage you to take a look at these resources to become more familiar with the tool.
Monitor
Beyond the power of XE to be able to better diagnose problems and trace events in your server, there is the ability to monitor for specific events as well. I won’t go into details about how to monitor with XE until a later article, but suffice it say I can monitor for Events to occur and immediately alert necessary parties to get more immediate action. This is quite some power for a built in tool and it is better than event notifications or agent alerts when it comes to ease of use and reliability.
As I look to monitor for these odd unplanned database offline events, I have several events within XE that can provide the requisite information: sqlserver.object_altered, sqlserver.database_started, sqlserver.database_stopped, and sqlserver.errorlog_written. I can hear you asking already “Wait, this seems to be a bit like a drill sergeant – very overbearing!”
Yes, it is probably a bit excessive for this session. However, I prefer to be comprehensive and the ability to link events together so I can better understand if it is a single one-off or if there is a bigger problem with the entire instance. Databases being stopped, started or set to offline should be rare and far between really. With that rarity in mind, the session should be relatively quiet.
IF EXISTS ( SELECT * FROM sys.server_event_sessions WHERE name = 'DBStateChange' ) DROP EVENT SESSION DBStateChange ON SERVER; GO EXECUTE xp_create_subdir 'C:DatabaseXE'; GO CREATE EVENT SESSION DBStateChange ON SERVER ADD EVENT sqlserver.object_altered ( SET collect_database_name = ( 1 ) ACTION ( sqlserver.sql_text,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname, package0.collect_system_time,package0.event_sequence,sqlserver.session_id ) WHERE object_type = 'DATABASE' --16964 AND (sql_text LIKE '%ONLINE%' OR sql_text LIKE '%OFFLINE%') ), ADD EVENT sqlserver.database_started( ACTION ( sqlserver.sql_text,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname, package0.collect_system_time,package0.event_sequence,sqlserver.session_id ) --WHERE object_type = 'DATABASE' --16964 ), ADD EVENT sqlserver.database_stopped( ACTION ( sqlserver.sql_text,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname, package0.collect_system_time,package0.event_sequence,sqlserver.session_id ) --WHERE object_type = 'DATABASE' --16964 ), ADD EVENT sqlserver.errorlog_written( ACTION ( sqlserver.sql_text,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname, package0.collect_system_time,package0.event_sequence,sqlserver.session_id ) WHERE (sql_text LIKE '%ONLINE%' OR sql_text LIKE '%OFFLINE%')) ADD TARGET package0.event_file ( SET filename = N'C:DatabaseXEDBStateChange.xel' ) WITH (STARTUP_STATE = ON ,TRACK_CAUSALITY = ON); /* start the session */ALTER EVENT SESSION DBStateChange ON SERVER STATE = START; GO
If I have that session running and then take a database online/offline or vise versa, I will see something very similar to this output.
Starting from the bottom and working my way up, I can see that a command was issues to bring the s database ONLINE. The very first thing that occurs is the request is written to the error log. Then I see that the database is in the stopped state. Next a message that the database is starting up (because it was stopped). Then we see two events for object_altered (similar to the default trace) due to the begin and commit phases of that transaction.
After that database was brought ONLINE, you can see that I immediately took s back offline – starting with the errorlog event, then a stopped event and the object_altered begin and commit events.
Capturing each of the events I noted previously, not only gives me a complete picture of the event, it also can help me to identify if something happens in between the various “expected” events. If I use this session in my monitoring setup, then I can be quickly alerted to problems with a database as well as have the archive of the events to go back in time and AUDIT or troubleshoot the event of a database being offline or unable to come online.
With this XE Session running, I can be more confident that I have trapped and correlated the correct events in each of the sources. Using the default trace method, I have to make some highly likely correlations but there is still some “magic” involved. With the XE session, you will be far less likely to see any of those events roll out of the log as well. I can’t underscore the importance of that fact enough. The data will be there when you need it!
Conclusion
We all aspire to having a perfect database environment where nothing surprising or unexpected happens. Unfortunately, that is the desire of dreams and fairy tales. The unexpected will happen. A database can unexpectedly be taken offline. Are you prepared to address the problem fully to the CTO should it happen? This XE session can help you with that.
Extended Events is a powerful tool to help in troubleshooting and tuning your environment. I recommend investing a little time in reading the 60 day series about Extended Events. This is not a short series but is designed to provide an array of topics to help learn the tool over time. Don’t forget to go back and read the companion article showing how to audit these events via the default trace.