On 19th June 2024, the world faced a major outage. The computer on which the Windows operating system was installed started crashing. The primary reason for the outage was a software update that was rolled out by an IT company named CrowdStrike.
In our company, we also faced the same issues. A lot of database servers were impacted, and the databases went offline. Once systems were up, we saw many production databases were in suspect mode. We already have a good backup strategy in place hence we did not face a lot of issues while restoring the databases, but a few development databases were inaccessible. During our initial analysis, we found that some databases were in suspect mode because the transaction logs of the databases were corrupted.
In this article, we will learn about the SUSPECT mode of the database and how to recover a database that is in SUSPECT mode due to a corrupt transaction log.
Understanding Suspect Mode in MS SQL
Before we dive into the process to understand what Suspect mode is, first, let us understand the different states in the database when it is corrupted.
Database remains online
When a database has one or multiple database files and any of the data files get corrupted by query execution or any other operations, the database will remain online and operational. The data within corrupt pages or files will remain inaccessible.
Database status changes to recovery pending mode
During the crash recovery phase, SQL Server begins the recovery of the databases but due to various reasons the SQL Server is not able to start the recovery of a damaged database, the database will be in recovery pending mode.
For example, a transaction is not committed and the server crashes and for some reason the transaction log file is deleted. When the recovery phase begins, the SQL Server is not able to roll back the uncommitted transaction.
Database status changes to SUSPECT mode
The suspect mode is different than the Recovery Pending mode. In this scenario, SQL Server can start the crash recovery but due to damaged transaction log files, the recovery process cannot be completed. For example, if a transaction is not committed and server crashes. When SQL Server starts the recovery process, it tries to read a corrupted transaction log record or page.
The database can be in suspect mode due to any of the following reasons.
- Database file or transaction log file get corrupted or damaged.
- Operating system crashes due to various factors.
- The transaction log files do not have adequate space to grow.
- Hardware failure.
- SQL Server services crashed in between the transaction.
Now, in the next section, we will understand how to restore a database which is in SUSPECT mode.
How to Recover a Database in SUSPECT Mode
In this section, we will understand how to recover a database that is in SUSPECT mode. For better understanding, I am going to corrupt a transaction log file of a user database. Please do not try this in your production environment. There is a well explained article written by a veteran DBA and SQL Server expert Paul Randle, which briefly explains the process to corrupt a transaction log. In this article, I am covering the steps only which are following:
Step 1: Create a sample database and a table
For demonstration, I have created a database, named ComputerShopInventory, and created a table, named HardwareInventory. The table contains a detail of the computer hardware, stock and price. Here is the script to create a table and database.
create database Computershopinventory go create table HardwareInventory ( id int identity(1,1), Product_Code varchar(10), Product_Name varchar(1000), Product_Stock int, Product_Price numeric(8,2) ) Go
Insert some dummy records by executing the following INSERT statement.
insert into HardwareInventory (Product_Code,Product_Name,Product_Price,Product_Stock) values ('PROD00001','Logitech Keyboard',10,50.90), ('PROD00002','Logitech Mouse',50,10.90) ,('PROD00003','Logitech Headphone',10,250.90) ,('PROD00004','Intel Board',10,5000.90)
Step 2: Perform a transaction and kill the server
Now, let us try to update the stock of the product “Logitech Mouse.” For that, we will initiate a transaction, update the record and write changes to the disk. The transaction must remain open so we will not issue a commit statement. Here is the code.
begin tran update HardwareInventory set Product_Stock=5, Product_Price=15000.00 where Product_Code='PROD00004' Go Checkpoint Go
We will simulate the SQL Server crash, hence shutdown the SQL Server services. Execute the following statement in new query editor window.
use master go shutdown with nowait go
Step 3: Damage the transaction log file and start SQL Server.
We can use a hex editor to corrupt any database transaction log file. Download and install a hex editor and open the transaction log file using it. Add some random entry in file and save the log file. My changes are shown in the following screenshot.
Now, Start the SQL Services.
Step 4: Check the status of the database
Once the SQL Server is online, execute the following query to check the status of the Computershopinventory database.
use master go select DATABASEPROPERTYEX('Computershopinventory','STATUS')[Database Status] go
Screenshot:
The above screenshot shows that the database is in a suspect state.
Now, let us understand how to recover this database.
Recover database from SUSPECT mode
The database can be recovered by any of the following methods: restoring or using emergency repair.
Method 1: Restore database from last clean backups.
The obvious option. If you have clean backups available, use them to restore a fresh database. There are many articles available online which help you to restore a SQL Server database.
Method 2: Perform EMERGENCY mode repair.
In this method, we will use DBCC CHECKDB to recover the database from SUSPECT mode. This method must be used as a last resort because there are chances of data loss. Here is a step-by-step process to repair a database
Step 1: Set the Database to Emergency Mode
When a database is in SUSPECT mode, we cannot access it, hence first we must change the mode of the ComputerShopInventory database from SUSPECT to EMERGENCY. To do that, execute following query:
ALTER DATABASE [ComputerShopInventory] SET EMERGENCY;
Step 2: Verify Database Integrity
Run the DBCC CHECKDB command to find the corruption level in the database. Execute following query.
DBCC CHECKDB (‘ComputerShopInventory’);
Query output
As you can see, the DBCC CheckDB did not return any error. Let us try to set it online.
ALTER DATABASE [ComputerShopInventory] SET ONLINE;
We have encountered following error
Msg 5181, Level 16, State 5, Line 7 Could not restart database "Computershopinventory". Reverting to the previous status. Msg 5069, Level 16, State 1, Line 7 ALTER DATABASE statement failed. Msg 824, Level 24, State 2, Line 7 SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xa7cdcef8; actual: 0x2315fee1). It occurred during a read of page (2:0) in database ID 7 at offset 0000000000000000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\DATA\Computershopinventory_log.ldf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
So now, we must run the consistency check on the entire database to rebuild the transaction log.
Step 3: Repair the Database
Before we begin the repair process of database, we must put it in single_user mode otherwise, you will encounter following error:
Msg 7919, Level 16, State 3, Line 1 Repair statement not processed. Database needs to be in single user mode.
Now, execute the query below to set database in single_user mode.
ALTER DATABASE [Computershopinventory] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
Once the database is in single_user mode, we will use DBCC CheckDB command with REPAIR_ALLOW_DATA_LOSS option. Here is the query.
DBCC CHECKDB (‘Computershopinventory’, REPAIR_ALLOW_DATA_LOSS);
The output is shown here:
As you can see in the above image, the transaction log has been rebuilt. Let us set the database online and make it accessible for all users. Execute following set of queries.
Use master go alter database Computershopinventory set online Go Alter database Computershopinventory set multi_user go
Check the database status using the following query.
use master go select DATABASEPROPERTYEX('Computershopinventory','status')[Database Status] go
Output:
As you can see in the above screenshot, the database is repaired and accessible to users.
Conclusion
The outage gave us an opportunity to revisit the backup strategy and start considering the investment in professional data recovery tools. Regular backups and proactive monitoring can help prevent databases from entering such outage and data loss. Manual methods offer a basic solution but may not always be effective and can lead to data loss. Stellar Repair for MS SQL provides a robust and reliable alternative, ensuring comprehensive recovery without data loss. suspect mode. You can refer to this article to learn more about it.