March 3, 2009 at 1:44 am
Hello Everyone:
I hope someone can assist with my question:
============================================================================================
Environment:
----------------
Production Environment running:
SQL Server 2005 - 9.00.2153.00 (Intel X86) Standard Edition
Windows NT 5.2 (Build 3790: Service Pack 1)
Using a domain service account.
The problem that was encountered:
-------------------------------------------
I have a two node cluster; I will call the nodes - node1 and node2.
- SQL Server has been running on node1 since I started DBA support for the SQL Cluster.
- Recently an incident occurred which caused SQL to failover onto node2.
- When SQL started up the following errors were displayed in the error log.
2009-02-21 11:06:59.90 spid5s Error: 15466, Severity: 16, State: 1.
2009-02-21 11:06:59.90 spid5s An error occurred during decryption.
2009-02-21 11:07:00.13 Server Error: 17190, Severity: 16, State: 1.
2009-02-21 11:07:00.13 Server FallBack certificate initialization failed with error code: 4.
2009-02-21 11:07:00.13 Server Warning:Encryption is not available, could not find a valid certificate to load.
- In addition to this I noticed that the existing full text indexes could not be used.
After a lot of investigation I ended up failing it back onto node1 again …… and the errors / warning disappeared and fti works. When SQL server starts up now, I receive the following message in the logs:
2009-02-23 19:06:12.87 Server A self-generated certificate was successfully loaded for encryption.
My concern:
---------------
Is that if it ever fails over again …… the same problem will occur again.
Backup of SMK
--------------------
Now that it is running on node 1 I have backed up the SMK using the command:
BACKUP SERVICE MASTER KEY TO FILE = 'H:\smk_20090224.smk' ENCRYPTION BY PASSWORD = 'aspecificpassword'
What I would like to do:
-----------------------------
1. I would like to schedule some down time of the system and fail it over onto node2 (where the decryption error occurs) and try to restore the SMK using the backup taken from node 1 ie :
RESTORE SERVICE MASTER KEY FROM FILE = 'H:\smk_20090224.smk’ DECRYPTION BY PASSWORD = 'aspecificpassword';
Does this sound plausible; is this the right thing to do?
2. Does any one know how can I recreate this error, so I can test it out?
============================================================================================
Cheers
Tim
March 3, 2009 at 2:48 am
Hi,
Just for clarification...
1. What is the Service account you are using?
2. Have you change service account settings using SQL Configuration Manager?
3. Both the nodes are installed with same level of service patch?
4. Have you tried FORCE option to REGENERATE Service Master Key yet?
Regards,
Sakthi
My Blog -> http://www.sqlserverdba.co.cc
March 4, 2009 at 1:38 am
Hi Sakthi
Thanks for looking an my entry
1. The service account is a domain account (ie DOMAIN\username). A requirement for SQL clusters ..... let me know if i have misinterpreted your clarification question.
2. The SQL service account has not been changed nor has the password, in fact the password is set to not expire.
3. Good question ...... (i checked this via the registry / add remove programs, if there is a better way let me know but ..... ) on node1 there is a patch for :
Hotfix 2153 for SQL Server Notification Services 2005
That does not exist on node2.
Everything else looks to be the same.
4. No I haven't ....... as its a production box .... I have been really cautious.
I was hoping to find a way to recreate the problem so that i could test fixing it without doing so directly on production.
Or perhaps chat with someone who had had to do this recently on one of their dev / test / prod systems.
Cheers
Tim
March 5, 2009 at 1:01 am
Hi,
So you have not changed service account since the cluster was started right? Can you connect to the passive node where it is failing and Check for "entropy" registry key in below Registry Node
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\ Security
If it is not present there then there is some issue in creating a new key upon failover. So two solutions at this stage.
1. Make Failover to passive node and stop and start the SQL Server (If possible :)) which should create a new key.
2. Run ALTER SERVICE MASTER KEY FORCE REGENERATE to create new key (Nothing wrong) which will re-encrypt all the contents.
3. Use BACKUP SERVICE MASTER KEY when the working node is active and make failover and then run RESTORE SERVICE MASTER KEY.
The Service Master key is combination of Service Account and Machine Account so if both are same then there should not be any issue. Please reply after try any of above methods.
Regards,
Sakthi
My Blog -> http://www.sqlserverdba.co.cc
June 29, 2009 at 8:20 am
Sakthivel Chidambaram (3/5/2009)
Hi,So you have not changed service account since the cluster was started right? Can you connect to the passive node where it is failing and Check for "entropy" registry key in below Registry Node
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\ Security
If it is not present there then there is some issue in creating a new key upon failover. So two solutions at this stage.
1. Make Failover to passive node and stop and start the SQL Server (If possible :)) which should create a new key.
2. Run ALTER SERVICE MASTER KEY FORCE REGENERATE to create new key (Nothing wrong) which will re-encrypt all the contents.
3. Use BACKUP SERVICE MASTER KEY when the working node is active and make failover and then run RESTORE SERVICE MASTER KEY.
The Service Master key is combination of Service Account and Machine Account so if both are same then there should not be any issue. Please reply after try any of above methods.
Hi,
I have a similar issue to the one decribed in this post. I have tried the above steps to regenerate the SMK for the sql instance however a REGENERATE or RESTORE always end with the following error
RESTORE SERVICE MASTER KEY FROM FILE = 'filepathTestServerSMK.bak'
DECRYPTION BY PASSWORD = 'password' FORCE
The current master key cannot be decrypted. The error was ignored because the FORCE option was specified.
Msg 15209, Level 16, State 1, Line 1
An error occurred during encryption.
The following message appears in my SQL Logs on startup (on the affected node)
Message
Error: 15466, Severity: 16, State: 1.
Message
An error occurred during decryption.
I'm kind of at my wits end at the moment as my understanding is that the SMK is encrypted against both the service account (all nodes have the same domian user account) and the machine name, therefore the machine name encryption should work.
Any help greatly appreciated
July 20, 2009 at 6:28 am
did you resolve this? I have the same issue
August 6, 2009 at 12:21 pm
I also have the same issue. The only thing I have done was to create an active directory user account called "sqlmanager" and gave Administrator, Domain Admin, and Domain User rights to him. Next, I changed the sql service login to run using the new account (sqlmanager) which ran under "Local System" prior. Started the sql service and get the following errors within the error log. I have never seen this before. In reading a number of post there was also a recommendation to disable VIA protocol, but it is not enabled. Whenever I switch the sql services to run under the "Local System" sql does start back up so I would say there is some sort of permissions error of some kind. Any ideas would be helpful.
2009-08-06 07:15:27.02 spid5s Error: 15466, Severity: 16, State: 1.
2009-08-06 07:15:27.02 spid5s An error occurred during decryption.
2009-08-06 07:15:27.07 spid5s Server name is 'VCPSRV'. This is an informational message only. No user action is required.
2009-08-06 07:15:27.07 spid9s Starting up database 'model'.
2009-08-06 07:15:27.08 Server Error: 17190, Severity: 16, State: 1.
2009-08-06 07:15:27.08 Server FallBack certificate initialization failed with error code: 1.
2009-08-06 07:15:27.08 Server Unable to initialize SSL encryption because a valid certificate could not be found, and it is not possible to create a self-signed certificate.
2009-08-06 07:15:27.08 Server Error: 17182, Severity: 16, State: 1.
2009-08-06 07:15:27.08 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x80.
2009-08-06 07:15:27.08 Server Error: 17182, Severity: 16, State: 1.
2009-08-06 07:15:27.08 Server TDSSNIClient initialization failed with error 0x80092004, status code 0x1.
2009-08-06 07:15:27.08 Server Error: 17826, Severity: 18, State: 3.
2009-08-06 07:15:27.08 Server Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2009-08-06 07:15:27.08 Server Error: 17120, Severity: 16, State: 1.
2009-08-06 07:15:27.08 Server SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.
-John
August 18, 2009 at 1:51 am
I am having the same issue. I have the feeling this may be due to the way the server OS was created. It was a clone of another machine and I think this has broken / not changed something in the cryptographic provider even though it has been syspreped
September 22, 2009 at 8:20 am
This link has a work around.
July 28, 2010 at 12:03 pm
The link worked for me...
thanks!!!
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply