February 15, 2022 at 1:57 pm
I have three node AG - 2 synchronous in one data center and the other asynchronous in a separate datacenter. A linked server has been configured for years and running without issue until all of a sudden recently. To add to the confusion, I'm only getting the error message on one of the synchronous nodes and also the asynchronous node. The linked server works on different servers outside of this environment as well. The research I've been able to come up with focuses on making sure the destination allows for ad hoc connections, SQL browser service is running, etc but all of that is fine as shown by the link working on some servers. I've compared settings across the servers and everything appears to be the same. Has anyone experienced the error below and can offer any insight for other things to check?
OLE DB provider "SQLNCLI11" for linked server "<server name>" returned message "Login timeout expired".
OLE DB provider "SQLNCLI11" for linked server "<server name>" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
Msg -1, Level 16, State 1, Line 0
SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].
February 16, 2022 at 2:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
February 16, 2022 at 9:18 pm
Might not hurt to check firewall settings. Could be that something was recently set up (intentional such as something to improve security, accidental such as a GPO that should have excluded that server but didn't due to a typo, or via windows update) that is blocking the connection.
If you have SSMS installed on the server, use that to try to connect directly to the linked server instance. If SSMS isn't installed on the server, you could use SQLCMD or even telnet. Telnet can be a bit of a tricky one though because if the connection goes to a big blank screen with a blinking cursor, then the connection was successful, it is just waiting for the rest of the connection information to come across before the SQL instance can know how to respond. I generally just kill the connection once I confirm I can connect with telnet or not.
But the error is telling you exactly what is wrong - the server cannot see the linked server anymore on the network. So something is in place that is preventing it and since you ruled out some things, my next thought is firewall. And since you said "nothing changed", I am expecting that a windows update modified the firewall settings OR that they are managed by a GPO that shouldn't have been applied to that server.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
February 17, 2022 at 1:44 pm
I've verified all of the appropriate ports are open on the servers so everything looks good there. It seems like the server that cannot make the connection is routing it to use a different port on the destination which doesn't make sense given the connection strings are the same on the server that is working and the server that is not working.
February 17, 2022 at 2:22 pm
My thoughts, if it isn't the software firewall, do you have a hardware firewall that your IT team needs to configure?
Alternately, it could be a routing problem. Depending on your network config, it could be that there is no route between your servers. I'd try doing a tracert as well as a ping between the servers to make sure there is a path to the server. I would do a ping of about 100 pings not just the default 4 so you can do proper analysis. 4 is good for checking if the connection exists, 100 is good (not great) to see if the connection is reliable. Or like I suggested earlier, telnet/SSMS. EVEN if you are 100% sure that the network is configured correctly and is not the problem based on the CONFIG, test it anyways. It might not be the firewall, it COULD be VLANs or the subnet mask or DNS. To me, based entirely on the error message, SQL is saying that it cannot connect to the remote instance because it cannot find it. Every time I have seen that error, it is a network related problem of some sort.
As I said above, it COULD be a DNS issue, so you could try using the FQDN or IP rather than server and see if it lets you connect. So instead of SERVERA\Instance, use SERVERA.CORP.COM\Instance where corp.com is your domain. Alternately, try the IP instead. That being said, DNS I still classify as a "network" error, so my first thought about it being network related is still my first (and only) guess.
Now, since it worked previously and suddenly stopped, that means someone changed something OR you might be starting to have some hardware failure. I would check the event logs on the problematic servers for any signs of failure AFTER ruling out the network. I'd do it after because testing the network is easier and faster than reading through a bunch of logs. And I don't just mean the SQL logs and the "default" windows logs, dig into the application and service logs as well IF the other logs are not helpful. In general, the SQL logs and default windows logs (the Application, Security, and System logs) will almost always lead you to the correct culprit, but in some odd edge cases you MAY need to dig into the "application and service logs". If you do need to dig into them, I recommend starting in the "Microsoft/Windows" folder and then look at the Network related ones. They may be disabled and you might need to enable them for them to capture anything useful though. I do recommend turning them off again afterwards as turning them on does cause more disk space to be used and can cause some performance impacts.
Lastly, in the event it isn't network and the logs are not helpful, I would double check the things you found about ad ho connections and the SQL Browser service. It COULD be the linked server is configured SLIGHTLY differently on the servers where it is working (such as using the "server\instance:port" method to connect and the broken ones don't have the port). Just because something works on server A doesn't mean it will work on server B if they are not configured the same.
I've also run into weird issues before where sometimes I just need to re-create the object. Have you tried re-creating the linked server with a new name just to see if that works? If it does, I'd drop the existing linked server connection and recreate it.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply