This came up on Twitter last night and if you're not very strong on networking, this can be really confusing. First, let's start with the protocols involved:
- ICMP - This is what is used for ping and tracert (traceroute) and it's at a lower level that either of the next two.
- UDP - This is a connection-less protocol. Since it is connectionless, even if you start communications using it, the response may not get through a firewall.
- TCP - This is a connection-oriented protocol. This is how SQL Server and a client talks when running queries, etc.
Step 1 - Ping the Server:
A first step towards troubleshooting is to try and ping a SQL Server. The key here is what is the normal behavior. Because of worms which pinged for targets first, it's not unusual to see ICMP blocked from everywhere but administrator workstations. So know if ping works before there's a problem. Realize, though, that if ping does work, it doesn't mean that there isn't a network config issue. I've seen cases where access from user vLANs have been changed overnight and while ping wasn't affected, TCP traffic was. However, if you could ping in the past and you can't now, that lets you start verifying the server is up and if it is, verifying that there weren't any network changes.
Step 2 - Check to See If Connection to Default Instances Work but not Named Instances (using Server\Instance to connect):
This may mean an issue with the SQL Server Listener or SQL Server Browser service communications. When a client tries to connect to a default instance, it immediately tries to connect on TCP port 1433 for that server's IP. So it never tries to query the SQL Server Listener/SQL Browser service. If you can connect to default instances but not named instances, especially if they are on the same server, then that likely means there is a problem talking to the service that provides the port the named instance is running on (since there isn't a standard on those). It could be at the client, it could be at the server, it could be blocked at the network. Then again, the connection to the named instance may actually be blocked. That leads to the next thing:
Step 3 - Check to See If You Can Connect via Port but not by Server\Instance:
If you can connect by specifying server,port, for instance, MySQLServer,7777 but not MySQLServer\NamedInstance, then that confirms there's something going on with the SQL Server Listener/SQL Browser service communications between server and client. That uses the UDP protocol. More on this in a minute. If it does *NOT* work but you know it's up and others can connect, then that means there is a block on the TCP port somewhere along the way. It could be at the client (especially if using something other than Windows firewall), on the network (especially if a vLAN configuration has recently changed) and it can even be on the server (check firewall, of course). If you've checked all of that but you still have an issue, check the IPSEC policies, because these can be used to block communications. If all that isn't it, you're going to need to get networking folks involved who can do and read network traces.
[EDIT] One last troubleshooting step here is to try and connect using telnet or puTTY. Windows XP and previous have telnet installed by default. But Vista and later do not, so you have to install it manually. I'm talking about the telnet Client here, not the telnet Server. You don't want the server, you want the client. From the command line you can run telnet servername port and see if you get a blank screen. If you do, then the SQL Server is responding on that port, and it would make me wonder about the client itself. If you don't connect, whether you get an immediate connection refused or it waits a few seconds and then tells you it couldn't make the connection, this is confirmation that something is up and blocking the port. If you can't install telnet or don't want to, there are clients out there, like PuTTY.
If you can connect via port but not by specifying the instance name, then it means something is going on with the UDP communications. First place to check is to ensure UDP/1434 is permitted on the client firewall. If you're seeing inconsistent results, sometimes restarting the SQL Browser service seems to work. If you're talking SQL Server 2000 only instances, sorry, you're stuck. The only way to restart the listener service is to stop and restart the SQL Server services, so far as I am aware. If restarting the service doesn't work, it's time to get the networking guys involved. We have intentionally blocked UDP/1434 before, especially during the SQL Slammer outbreak. Therefore, make sure it's not something they're doing. If they aren't doing it, then can run the network traces and help locate where the breakdown in communications is happening.