Working diligently as any good DBA might, you have established maintenance routines for each of the SQL Servers under your purview.
As a part of this maintenance you have scripted solutions to intelligently manage and maintain the fragmentation levels for each of the indexes within each database on each instance.
To further show how diligent you are as a DBA, the outcomes of each maintenance run are logged and you review the logs each morning. This routine helps keep you on top of everything that is happening within the environment.
For months, maybe even years, things are running smoothly. Never a failure. Never an error. Just routine log review day after day. Then one day it happens – there is an error. The index maintenance script failed one night.
Index Cannot be Reorganized…
You receive the error message similar to the following:
Msg 2552, Level 16, State 1, Line 1 The index “blah” (partition 1) on table “blah_blah_blah” cannot be reorganized because page level locking is disabled
Immediately, you start double-checking yourself and verifying that it worked the previous night. You even go so far as to confirm that the same index was previously reorganized. How is it possible that it is failing now on this index. What has changed? Has something changed?
Time for a little digging and investigating, so the dirty work begins. On one side of the coin you are relieved to be able to do something different. On the other side of that coin, you are rather annoyed that something seems to have changed. These feelings are perfectly normal!
First things first – you investigate the indexes in question to confirm what the error is saying. This is easily done with a query such as the following:
SELECT i.name AS IdxName , SCHEMA_NAME(o.schema_id) AS SchName , OBJECT_NAME(i.object_id) AS ObjName , i.allow_page_locks FROM sys.indexes i INNER JOIN sys.objects o ON o.object_id = i.object_id WHERE OBJECTPROPERTY(i.object_id,'IsMSShipped') = 0;
Scrolling through the results, you notice (eventually) that the IX_SpecialOfferProduct_ProductID in the AdventureWorks2014 database has page locks disabled. You are absolutely certain that this index was created allowing page locks. Pondering the problem for a moment, you recall having read about the default trace (there are several articles on the default trace – here) and the thought occurs to try and see if there is a breadcrumb there about the change. Using the query from that default trace article, a picture starts to unscramble. Here is that query reposted and a snippet of the results:
DECLARE @DBName sysname = 'AdventureWorks2014' ,@d1 DATETIME ,@diff INT; SELECT ObjectName , ObjectID , DatabaseName , StartTime , EventClass , EventSubClass , T.TextData AS SQLStatement , ObjectType , ServerName , LoginName , NTUserName , ApplicationName , CASE EventClass WHEN 46 THEN 'CREATE' WHEN 47 THEN 'DROP' WHEN 164 THEN 'ALTER' END AS DDLOperation INTO #temp_trace FROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT REVERSE(SUBSTRING(REVERSE(path), CHARINDEX('\',REVERSE(path)),256)) + 'log.trc' FROM sys.traces WHERE is_default = 1)), DEFAULT) T WHERE EventClass in (46,47,164) AND EventSubclass = 0 AND ObjectType <> 21587-- don't bother with auto-statistics as it generates too much noise AND DatabaseName = @DBName; SELECT @d1 = MIN(StartTime) FROM #temp_trace; SET @diff= DATEDIFF(hh,@d1,GETDATE()); SELECT @diff AS HrsSinceFirstChange , @d1 AS FirstChangeDate , tt.StartTime AS LastChange , sv.name AS obj_type_desc , tt.ObjectType , tt.DDLOperation , tt.SQLStatement , tt.DatabaseName,tt.ObjectName , tt.EventClass,tt.EventSubClass , tt.ServerName,tt.LoginName, tt.NTUserName , tt.ApplicationName , (dense_rank() OVER (ORDER BY ObjectName,ObjectType ) )%2 AS l1 , (dense_rank() OVER (ORDER BY ObjectName,ObjectType,StartTime ))%2 AS l2 FROM #temp_trace tt INNER JOIN sys.trace_events AS te ON tt.EventClass = te.trace_event_id INNER JOIN sys.trace_subclass_values tsv ON tt.EventClass = tsv.trace_event_id AND tt.ObjectType = tsv.subclass_value INNER JOIN master.dbo.spt_values sv ON tsv.subclass_value = sv.number AND sv.type = 'EOD' ORDER BY StartTime DESC; DROP TABLE #temp_trace;
This is a great start. Not seen in the results is the timestamp showing when it was done – which was due solely to snipping. Also not shown is the text of the statement that was run for those three events. So close, yet so far away. This is not quite enough to have the smoking gun evidence to show Jason (me) that I did something wrong and unauthorized. What to do now?
All is not lost yet! Your stunning memory kicks in and you recall several articles about using Extended Events to audit server and database changes. Better yet, you recall that you deployed an XE session to the server where this error occurred. And yes, you are slightly embarrassed that you failed to remove the XE session after fiddling with it. We won’t tell anybody that you deployed a test XE session to a production server as if it were your sandbox. The session currently deployed is trapping all object changes unlike the following session that has it filtered down to just objects that are indexes.
USE master; GO -- Create the Event Session IF EXISTS ( SELECT * FROM sys.server_event_sessions WHERE name = 'ObjChangeAudit' ) DROP EVENT SESSION ObjChangeAudit ON SERVER; GO EXECUTE xp_create_subdir 'C:\Database\XE'; GO CREATE EVENT SESSION ObjChangeAudit ON SERVER ADD EVENT sqlserver.object_altered ( SET collect_database_name = ( 1 ) ACTION ( sqlserver.sql_text,sqlserver.nt_username,sqlserver.server_principal_name,sqlserver.client_hostname, package0.collect_system_time,package0.event_sequence ) WHERE database_name = 'AdventureWorks2014' AND object_type = 'INDEX' OR object_type = 'OBJ' ) ADD TARGET package0.event_file ( SET filename = N'C:\Database\XE\ObjChangeAudit.xel' ) WITH (STARTUP_STATE = OFF ,TRACK_CAUSALITY = ON); /* start the session */ALTER EVENT SESSION ObjChangeAudit ON SERVER STATE = START; GO
You query the trace file with a query like this:
SELECT CAST ([t2].[event_data] AS XML) AS event_data, t2.file_offset,t2.file_name, cte1.event_session_id--, '' AS event_predicate INTO #xmlprocess FROM ( SELECT REPLACE(CONVERT(NVARCHAR(128),sesf.value),'.xel','*.xel') AS targetvalue, ses.event_session_id FROM sys.server_event_sessions ses INNER JOIN sys.server_event_session_fields sesf ON ses.event_session_id = sesf.event_session_id --INNER JOIN sys.server_event_session_events sese --ON ses.event_session_id = sese.event_session_id WHERE sesf.name = 'filename' AND ses.name = 'ObjChangeAudit' ) cte1 OUTER APPLY sys.fn_xe_file_target_read_file(cte1.targetvalue,NULL, NULL, NULL) t2 ; SELECT event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name , event_data.value('(event/data[@name="database_name"]/value)[1]', 'varchar(max)') AS DBName , event_data.value('(event/data[@name="ddl_phase"]/text)[1]', 'varchar(max)') AS DDLPhase , event_data.value('(event/data[@name="object_type"]/text)[1]', 'varchar(max)') AS object_type , event_data.value('(event/@timestamp)[1]', 'varchar(50)') AS [TIMESTAMP] , event_data.value('(event/action[@name="collect_system_time"]/value)[1]', 'varchar(max)') AS SystemTime , event_data.value('(event/action[@name="client_hostname"]/value)[1]', 'varchar(max)') AS ClientHostName , event_data.value('(event/action[@name="server_principal_name"]/value)[1]', 'varchar(max)') AS ServerPrincipalName , event_data.value('(event/action[@name="nt_username"]/value)[1]', 'varchar(max)') AS nt_username , event_data.value('(event/action[@name="sql_text"]/value)[1]', 'varchar(max)') AS sql_text FROM #xmlprocess x LEFT OUTER JOIN sys.server_event_session_events sese ON x.event_data.value('(event/@name)[1]', 'varchar(50)') = sese.name AND x.event_session_id = sese.event_session_id ORDER BY timestamp,event_data.value('(event/action[@name="event_sequence"]/value)[1]', 'varchar(max)'); DROP TABLE #xmlprocess;
Wow! Look at the results! There is a smoking gun finally.
Two results happen to pin the root cause of the change squarely on my shoulders. Yup, Jason changed that index to disallow page locks. I highlighted three different areas of interest in the results. The yellow and green indicate the DDL phase. One row for the start of the statement, and another row for the commit of that statement. The red highlight shows me the exact change that was made to this index. This is all very good info!
What Now?
It really is great to have the smoking gun. If something is broke and it worked previously, it is essential to find the root cause. With a root cause under the belt, what needs to be done to fix the failure? That is a little bit easier that finding the root cause. That is, unless there is a technical reason for the index to no longer allow page locks (maybe that smoking gun is less of a smoking gun and more like baby spittle after all).
Here is how you fix the problem:
ALTER INDEX IX_SpecialOfferProduct_ProductID ON Sales.SpecialOfferProduct SET (ALLOW_PAGE_LOCKS = ON) GO
But, But, But…
The Extended Events session would be very noisy and capture every alter index statement, right? It should capture statements like the following, right?
ALTER INDEX IX_SpecialOfferProduct_ProductID ON Sales.SpecialOfferProduct REBUILD GO ALTER INDEX IX_SpecialOfferProduct_ProductID ON Sales.SpecialOfferProduct REORGANIZE GO ALTER INDEX IX_SpecialOfferProduct_ProductID ON Sales.SpecialOfferProduct REBUILD WITH (DATA_COMPRESSION = PAGE) GO
The answer to that question is: Yes, Yes, Yes. These statements are all captured due to the use of the ALTER statement. Here are the results of the XE session with all of these scripts having been executed:
If you want to audit when the indexes are changing and how they are changing, this will do just the trick. If there are a ton of changes, then be prepared for a deluge of information.