June 18, 2009 at 1:42 am
Hi everybody
I have a job running on SQL 2005 instance - the main purpose of the job is to move serial data from a production database to an archive database to help with performance. There is a web service that connects to the production DB as well as an application - these connections should always be up unless the job is running. The production DB needs to be put into single user mode so that no transactions take place whilst the archive process is running.
The job steps are as follows:
1 - Backup Archive DB
2 - Stop web service
3 - Kill DB sessions (using an SP that loops through open SPIDs and kills the commands)
4 - Set Production DB to single user mode
5 - Backup Production DB
6 - Set Production DB to Simple Recovery
7 - Archive Production DB
8 - Set Production DB to Full Recovery
9 - Set Production DB to multi_user mde
10 - Start web service
The job executes the 3rd step against the master DB as SA whereas all the other steps are executed as a user with sysadmin rights that is only used for this job. As you'll see from the steps the web service is stopped and started. The app that connects to the DB always connects as SA which is why the job is executed as another user to "kick" the SA user off - as soon as the DB is back in multi-user mode the app automatically reconnects.
The problem I have with this job is that it sometimes succeeds and sometimes fails after the 7th step and doesn't set the DB back to multi_user mode which means that users can't login to do transactions again after the archive has run.
The steps that usually fail are step 8 and 9 with the following error:
Unable to connect to SQL Server '(local)'. The step failed.
I am assuming that the job steps can't access the DB as the archive user and therefore can't alter the database but what doesn't make sense to me is that the previous steps executed against the DB as the same user were successful. Surely if the archive user has successful access in one step it would be the same in the next?
Any ideas or suggestions would be hugely appreciated as this job needs to run in the middle of the night so it inconveniences as few users as possible and I would love it to run through properly on its own so I don't need to watch it every night.
June 18, 2009 at 1:45 am
to set db in single user mode without looping, use this
alter database mydb set single_user with rollback immediate
to set db in multi user mode,use this
alter database mydb set multi_user
June 18, 2009 at 1:47 am
what do u mean by archive production db? are you taking backup of production db here?
also what is the relation between archive and production db? i cant seem to find a relation in the job steps u mentioned.
June 18, 2009 at 1:48 am
Sorry PS I should have posted more detail on those steps - I'm using those exact commands to set the DB to single or multi_user mode.
The archive process involves copying serial data from the production database to the archive database and the deleting it from the production database. Its used to decrease the size of some of the tables in the DB to improve performance. The archive DB is meant to be a working replica of the production DB - in other words if one of the users needs to see a report on data from some time ago they would just have to login to the archive DB instead of the production DB.
June 18, 2009 at 1:50 am
Hello,
Unable to connect to SQL Server '(local)'
Do you have the full message, particularly the Error Number and State? These are very useful when trying to diagnose connection issues.
Regards,
John Marsh
www.sql.lu
SQL Server Luxembourg User Group
June 18, 2009 at 1:53 am
John Marsh (6/18/2009)
Hello,Unable to connect to SQL Server '(local)'
Do you have the full message, particularly the Error Number and State? These are very useful when trying to diagnose connection issues.
That's the full error message in the job history. If I look at the selected row details this is all there is (I've only removed the server name as it is a client's server):
Date2009/06/17 23:34:02
LogJob History (Archive Talkwyz)
Step ID9
ServerXXXXXXXX
Job NameArchive DB
Step NameSet database to Full Recovery
Duration00:00:02
Sql Severity0
Sql Message ID0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted0
Message
Unable to connect to SQL Server '(local)'. The step failed.
June 18, 2009 at 2:03 am
Hello again,
Is there any Login Failure message or similar in the SQL Server Log around the time that the Step Fails?
Regards,
John Marsh
www.sql.lu
SQL Server Luxembourg User Group
June 18, 2009 at 2:06 am
also try to run the job as SA and see if the problem relates to privilages of the archive user??
Since the database is put under simple recovery model, i dont think problem can be related to tran log being full or something of that sort.
June 18, 2009 at 2:23 am
PS - the same thing happens with the SA user - sometimes the job runs through fine and sometimes not so it doesn't seem to be the archive user's rights.
John Marsh - I've checked the log and apart from there being a number of login failures for SA (which is the app that keeps trying to login to the production DB) the log entries at roughly the same time that steps 8 and 9 of the job failed are below (again I've just removed the client DB name):
Date2009/06/17 23:19:59
LogSQL Server (Current - 2009/06/18 10:04:00)
SourceBackup
Message
Error: 3041, Severity: 16, State: 1.
Date2009/06/17 23:19:59
LogSQL Server (Current - 2009/06/18 10:04:00)
SourceBackup
Message
BACKUP failed to complete the command BACKUP DATABASE (Production DB). Check the backup application log for detailed messages.
Date2009/06/17 23:34:04
LogSQL Server (Current - 2009/06/18 10:04:00)
SourceLogon
Message
Error: 18456, Severity: 14, State: 16.
Date2009/06/17 23:34:04
LogSQL Server (Current - 2009/06/18 10:04:00)
SourceLogon
Message
Login failed for user 'NT AUTHORITY\SYSTEM'. [CLIENT: ]
From that point on its just failures for SA logins (nothing for the archive user) and failures from the backup. The server is managed by an external provider so the backups are run using their own software and scheduling. The archive job is run by our company which vends the software app to the client.
June 18, 2009 at 2:38 am
Hello again,
The message at 23:34:04 could be related.
To quote MS:-
“State=16 means that the incoming user does not have permissions to log into the target database. So for example say you create a user FOO and set FOO's default database to master, but FOO does not have permissions to log into master.
This can also happen if for example the default database for user FOO is not online (for example the database is marked suspect).
So to check on this theory, try logging the user into some other database and then try using the USE DATABASE command to switch to the target database, you will get a better error message as to the root cause of the failure”
Do you have the (Advanced) Option set on the Step to include the output in the History? It is worthwhile doing if not.
Regards,
John Marsh
www.sql.lu
SQL Server Luxembourg User Group
June 18, 2009 at 2:53 am
Thanks John - I'll include the output in the history and post again if I run into the problem again.
June 19, 2009 at 1:00 am
Hello again,
I had one more thought: If the Error with State 16 is related, then may be another Maintenance Job is (sometimes) taking the user’s default DB offline. This would explain the intermittent appearance of the issue, rather than it being related to permissions or job configuration.
It could be worthwhile looking at the Agent History to see what other Jobs were running at the time you last had the error.
Regards,
John Marsh
www.sql.lu
SQL Server Luxembourg User Group
June 19, 2009 at 6:17 am
alidahope (6/18/2009)
The steps that usually fail are step 8 and 9 with the following error:Unable to connect to SQL Server '(local)'. The step failed.
This sounds to me as if the job is attempting to use multiple connection strings to the database. And since you have it set as single_user, it's failing. It could also be dropping one connection and then trying to re-initialize, which would cause the failure.
Have you looked into using an SSIS package for this job? You could put in Execute SQL tasks to do the switch between users, then force the entire package to use one single continuous connection so your job doesn't drop the connection and fail on those steps.
August 19, 2009 at 6:26 am
Hi,
I am trying to achieve the same thing as you. I need to archive data from a Production database into an archive Database. The db still needs to be online. I need to purge the production database regularly so that i can keep the db clean and achieve maximum performance.
I need to also bear in mind that some of our customers use Sql Express which does not support replication, scheduled tasks etc. I tried Replication, but after cleaning up the production database, it filter the deletes to the archive database. I considered partitioning but you need Sql Enterprise for that.
I am moving towards a parameterized stored procedure. I need to get the rollback and the necessary checks in place to avoid duplication and i need to ensure the data integrity.
How do you go about archiving your data and would you have any other ideas?
August 19, 2009 at 11:26 am
SSIS has a lot of options available to help you out, but if you're looking for a T-SQL Solution, you can easily go with ye ole Left Outer Join with a "Field IS NULL" in the where clause to make sure you only select information from the source table that isn't in the destination table.
Of course, this might not be the best solution for large record sets. You'd have to experiment with techniques to see which works the best for you.
And vishnup, don't piggy back your questions on someone else's thread or you might never get an answer. Start a new thread with your question. Especially as your question doesn't appear to have anything to do with a package error.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply