October 4, 2007 at 3:43 am
We had some fun on our main server last night. Started around 11pm with the following in the error log. (MyDB is the default db for all of the logins listed)
2007-10-03 23:00:59.42 Backup Log was backed up. Database: MyDB, .....
2007-10-03 23:01:38.09 Logon Error: 18456, Severity: 14, State: 27.
2007-10-03 23:01:38.09 Logon Login failed for user 'MyUser.
According to docs, State 27 on a login failure means that the server could not determine initial db (offline, suspect or not there)
That went on for a minute, the rest of the log backups ran (it's a scheduled job every 15 min) then the following started appearing
2007-10-03 23:02:53.48 Logon Error: 18456, Severity: 14, State: 16.
2007-10-03 23:02:53.48 Logon Login failed for user 'AnotherUser'.
State 16 is apparently "Login does not have rights on default db." The login in question most certainly does.
This continued right through to 4 this morning when one of the admins noticed soemthing was not right. No log backups ran during that period, though they were supposed to.
The admin who fixed it noted that while he could log into the server, he could not run any queries. Not even a check of running processes worked. They just ran, and ran, and ran, ....
What he did to resolve it was to kill processes bindly, starting at 51 and workign up until the server started responding again. Since then its been running very smoothly
There are no other messages in the error log. There's nothing suspicious in the windows event log. None of the dbs were suspect or offline.
Server specs: 12 proc Itanium. 48GB memory. Running Server 2003 datacenter and sql 2005 enterpriise SP2
Anyone want to take a stab at what might have been the cause?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 4, 2007 at 6:16 am
I've seen this when SQL Server is overwhelmed by connections, but that was on hardware substantially less than what you've got.
K. Brian Kelley
@kbriankelley
October 4, 2007 at 6:29 am
I've also seen it in those circumstances, but it's fairly unlikely here. 11pm is the beginning of the overnight process run, and there should be no more than 30-50 connections at the time. Peak of the day can be 5 times that.
The worrying thing for us right now is that since we don't have a cause, we've got no idea if it might happen again
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 4, 2007 at 6:41 am
Did you have any performance counters monitoring the vitals?
K. Brian Kelley
@kbriankelley
October 4, 2007 at 6:52 am
Not at the time, no.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 4, 2007 at 7:26 am
How did the admin get in to kill spids if he could not run queries? If he used the DAC I believe there are only a few types of queries you can run.
October 4, 2007 at 7:27 am
It was worth a shot. Running a server side trace is likely to kill you... I have seen that issue when someone was doing a trace, too, come to think of it, so it may be that one of memory pools were exhausted for some reason. Only thing I can think of is to put the counters up and cross your fingers if there's nothing in the logs.
K. Brian Kelley
@kbriankelley
October 4, 2007 at 7:36 am
Anders Pedersen (10/4/2007)
How did the admin get in to kill spids if he could not run queries? If he used the DAC I believe there are only a few types of queries you can run.
He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)
Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 4, 2007 at 7:45 am
K. Brian Kelley (10/4/2007)
It was worth a shot. Running a server side trace is likely to kill you... I have seen that issue when someone was doing a trace, too, come to think of it, so it may be that one of memory pools were exhausted for some reason. Only thing I can think of is to put the counters up and cross your fingers if there's nothing in the logs.
Only the default trace was running, and I remembered that too late to check what was in it. 🙁
Any counters you'd specifically suggest?
*crossing fingers*
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 4, 2007 at 8:02 am
Any related to memory usage by SQL Server, # of connections, lock timeouts/deadlocks, and the standard battery at the OS level (memory, processor, disk, etc.).
K. Brian Kelley
@kbriankelley
October 4, 2007 at 8:26 am
you should also run sp_blocker script to keep an eye on the processes and "hopefully" identify a long blocking process.
Good Luck
* Noel
October 4, 2007 at 8:50 am
Good point. If it's a really bad query eating up memory and taking forever to execute, it's going to hold locks that'll show up. This will catch such a query without having to resort to a full-on server trace.
K. Brian Kelley
@kbriankelley
October 4, 2007 at 8:57 am
GilaMonster (10/4/2007)
He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)
Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.
If he never saw the results - how do you know for a fact that the Kill actually worked? versus whatever nasty thing was running just finishing on its own while Admin is trying to find it....
I HATE those scenarios....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 4, 2007 at 9:00 am
Matt Miller (10/4/2007)
GilaMonster (10/4/2007)
He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)
Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.
If he never saw the results - how do you know for a fact that the Kill actually worked? versus whatever nasty thing was running just finishing on its own while Admin is trying to find it....
I HATE those scenarios....
Kill logs its activity on SQL Server Error Log and if after a "killer" loop :hehe: you see that responsiveness is back to normal you know it worked 😉
* Noel
October 4, 2007 at 10:19 am
Are you running DTC?
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply