October 14, 2015 at 4:13 am
The current contract I'm working on at the moment are getting very intermittent "Login Timeout Expired" messages when the app/jobs/packages try to run at varying times during the day.
The server is healthy, the only thing I can think of is the very frequent log backups happening every 2 minutes as I can't seem to pinpoint it to a particular point as the error occurs from random places in the system at random times.
The DB is part of a mirror, nothing of interest in the error log other than a log backup at the same time. Mirror is a high performance mirror, no witness, no auto failover and no manual failovers at the times the errors occur.
The mirror has 18456 errors state 38 which relate the the errors seeing, but as theres nothing thats redirecting the session to the mirror, I'm at a loss as to what could be causing the errors and if it is directing to the mirror how its redirecting the traffic to it.
My gut feeling was the frequent log backup, but from what I remember backups shouldn't cause these issues.
Message
Executed as user: DOMAIN\AgentAccount. Microsoft (R) SQL Server Execute Package Utility Version 10.50.4042.0 for 64-bit Copyright (C) Microsoft Corporation 2010. All rights reserved. Started: 11:00:00 Error: 2015-10-14 11:00:17.14 Code: 0xC0202009 Source: SOURCE1 Connection manager "SERVER.DATABASE" Description: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Login timeout expired". An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Login failed for user 'DOMAIN\AgentAccount'.". An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Cannot open database "DATABASE" requested by the login. The login failed.". End Error Error: 2015-10-14 11:00:17.14 Code: 0xC00291EC Source: Task1 Execute SQL Task Description: Failed to acquire connection "SERVER.DATABASE". Connection may not be configured correctly or you may not have the right permissions on this connection. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 11:00:00 Finished: 11:00:17 Elapsed: 16.942 seconds. The package execution failed. The step failed.
October 14, 2015 at 7:00 am
Bit of further reading up on mirroring, if using the Native Client and not specifying the failover partner property on the connection string (which isn't happening here) the native client will cache the failover partner name and will reconnect to the mirror on a failure.
This seems to coincide with the errors on the mirror, but as this is a async mirror there isn't any auto failover or manual failover happening so unsure how the native client is wanting to go to the mirror instead of the principle
October 14, 2015 at 8:43 am
Are the same apps / packages getting this consistently or is it kind of random which apps / packages will get this failure?
October 14, 2015 at 8:52 am
It's completely random to which app/job/package and what time is also random.
Today's example was a job which runs every 30 minutes, 10:00 Success, 10:30 failed, 11:00 Success and success for the rest of the day so far, yesterday success all day.
Yesterday a few of the web application screens got the error, but not the jobs or packages.
October 14, 2015 at 10:18 am
We occasionally have this issue when our corporate office is rebooting / updating our DNS servers. Have you checked your client's DNS to make sure it isn't flaking?
October 15, 2015 at 1:30 am
Nope not checked the DNS client as of yet.
Guessing it has something to do with the mirroring connection timeout being set to 10, but the connection string is 30, so probably a long login is tripping the mirror connection so its going to the mirror instead of principle.
Posted the problem on MSDN forums also, with a recommendation to see whats going on in the application event log.
Can't go back more than 4 hours on the SQL box where the job failed as its heavily written in the app log so will see if the issue happens again today and check the app log for more info, but I didn't notice anything un-ordinary yesterday.
October 15, 2015 at 2:54 am
I know nothing about this, just making some observations.
Elapsed time was 16.942 seconds - doesn't seem to tie with a Login Timeout of either 10 or 30 seconds?
"Cannot open database "DATABASE" requested by the login" - could it be that the default database, for the login, is becoming "unavailable" for some reason (i.e. login succeeded but default database was, then, "not found")?
Dunno if changing default database to, say, MASTER and then a USE MyDATABASE would be an option? or even if that would make any difference if the database was, at that time, "not available"
October 15, 2015 at 3:03 am
Elapsed time was 16.942 seconds - doesn't seem to tie with a Login Timeout of either 10 or 30 seconds?
I didn't think so either...
...unless the first 10 second was trying to connect to the principal and it took a while to initiate the connection to the mirror to login to find the DB in a restoring state?
"Cannot open database "DATABASE" requested by the login" - could it be that the default database, for the login, is becoming "unavailable" for some reason (i.e. login succeeded but default database was, then, "not found")?
Dunno if changing default database to, say, MASTER and then a USE MyDATABASE would be an option? or even if that would make any difference if the database was, at that time, "not available"
The default DB is master, the connection manager is then detailing which DB to point to.
As a side note, two packages both failed at 9:29, one which runs every minute and one which had been running since 8:00.
Same error message, login timeout, unable to open DB, got a failed audit login trace running on the mirror, can see both SSIS packages trying to connect to the mirror at the time, nothing in the app log, nothing in the error log, nothing in the agent log.
October 15, 2015 at 3:49 am
anthony.green (10/15/2015)
The default DB is master, the connection manager is then detailing which DB to point to.
I presumed that "DATABASE" in your message "Description: "Cannot open database "DATABASE" requested by the login" was the actual name of a database (which you have obfuscated for your post here). if I've got that right why isn't it saying MASTER (i.e. the default database connection)? If it IS a User Database then it must have got further than the actual login to be actually trying to connect to that (user) DB?
Of course if the error message actually says "DATABASE" then ignore all that, but in that case its a pretty crummy error message!!
October 15, 2015 at 3:55 am
Correct "DATABASE" is an obfuscated value for the true user database name.
As this is an SSIS package which runs via an agent, the connection manager is setting the initial catalog to the user database so overriding the default database set at the login level.
I have a login failure trace running on the mirror, its hitting the mirror for some reason, probably network latency (but not been able to diagnose it as Solarwinds shows everything is A.OK) due to the mirror connection timeout being 10 seconds, so once it hits the mirror, it tries to open the user db which is in a restoring state and fails with error 18456, state 38
October 15, 2015 at 4:01 am
The error messages seem confusing to me (in light of your description). Wouldn't be the first time that a cascade of error messages has obscured the real issue ..., however the ordering of them seems "odd" - even as an error-cascade.
"Login failed for user 'DOMAIN\AgentAccount'."
"Cannot open database "DATABASE" requested by the login. The login failed.".
Particularly if the login should not be opening a default database at all? (i.e. it should be connecting to MASTER)
But like I said, I know zitch about this so only chucking things out there in case it triggers a mosre useful though 🙂
October 15, 2015 at 5:09 am
Have you tried running a Profiler Trace along with PerfMon for a set period of time, seeing if you can find anything that way?
EDIT: Oh, hey. Do you have your SAFETY on? We have a data center to data center mirror that screwed up all sorts of processes (though it made them take longer rather than having login issues) because SAFETY was enabled. We eventually had to suspend the Partner during the nightly process (so it would only take 2-3 hours instead of 8-11) because even disabling the SAFETY didn't work. But then again, we were going from one data center to another, so disabling SAFETY might help.
October 15, 2015 at 6:09 am
No SAFETY is off, but thanks for the suggestion.
Not got perfmon running, but do have a trace running as we are diagnosing a separate issue.
We are under SOx so anything done on live needs a 7 day lead time where applicable so PerfMon is a few days away.
These nothing in the trace on the principal that stands out.
I have got a change in for next Thursday to switch the timeout to 35 seconds, just on the off chance it is this.
October 22, 2015 at 7:32 am
OK, changed the timeout to 35 still got errors, changed it to 60 still got errors.
Back to the investigations
October 26, 2015 at 6:16 am
It just occurred to me...
Have you checked the default database on the logins affected by this error?
I usually set the default DB to master because I've found that if the default DB is a user db, it can cause the login to error out if the user db is dropped or in the process of being restored.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply