August 25, 2017 at 6:06 am
We have been running the following PowerShell command in a SQL Server Agent job step without issue for many months in our other instances, to mirror database backup files to a separate location on another server share:
ROBOCOPY "\\Server1\BackupShare" "\\Server2\BackupShare" /S /MIR
However, having set up a new SQL Server 2016 instance, we cannot get this simple command to work. It sometimes mirrors, and sometimes doesn't. And on every occasion, it leaves the agent job hanging so that it never finishes. This is problematic as it will not run the next time as scheduled.
The SQL Server and SQL Server Agent services are configured to run under an account that has been granted full privileges to the shares on both servers. This account is also the SQL Server Agent job owner. The result is the same if the job is run as scheduled or ad-hoc.
Running the command in a separate PowerShell window works without any issue.
We are running:
Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64)
Oct 28 2016 18:17:30
Copyright (c) Microsoft Corporation
Web Edition (64-bit) on Windows Server 2016 Standard 6.3 <X64> (Build 14393: ) (Hypervisor)
Sadly, there is no further information in the following to give us any hints of the actual issue.
- SQL Server error log
- SQL Server agent error log
- Event Viewer
Does anyone please have any ideas as to how we can get this to work again? We can only assume it must be some sort of security issue?
August 25, 2017 at 7:05 am
I don't quite know what the /MIR option does for ROBOCOPY operations, but when you say it sometimes mirrors, but sometimes not, we need to know what that actually means. If it does "mirror" (I presume that is because of the /MIR option, but would rather you confirm), does the agent job complete successfully? What exactly does that /MIR option do, and what could cause it to either seriously slow down, or otherwise fail to complete? Also, other than not being able to run the next time, what would the consequences be of someone cancelling the agent job execution by some means like killing the SPID?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 25, 2017 at 7:10 am
August 25, 2017 at 7:15 am
August 25, 2017 at 7:47 am
/S : Copy Subfolders.
/MIR : MIRror a directory tree - equivalent to /PURGE plus all subfolders (/E)
Incredibly, it appeared to be a security issue on a share subfolder on the source server. Even though the share itself had the necessary security permissions for the account, one of the subfolders had to have permissions explicilty granted. Once this had been done, it worked fine. A shame that neither PowerShell not SQL Server were able to be more helpful on this one.
August 25, 2017 at 8:24 am
zoggling - Friday, August 25, 2017 7:47 AM/S : Copy Subfolders./MIR : MIRror a directory tree - equivalent to /PURGE plus all subfolders (/E)Incredibly, it appeared to be a security issue on a share subfolder on the source server. Even though the share itself had the necessary security permissions for the account, one of the subfolders had to have permissions explicilty granted. Once this had been done, it worked fine. A shame that neither PowerShell not SQL Server were able to be more helpful on this one.
Actually, that's known as the hang problem associated with running what amounts to a command prompt functionality with no console access, so any errors you get won't show up because there's no way for the error to get presented, but the process that's getting the error will still try and present the error, but will have no place to put it, and thus the hang. You have to be sure that automating a power shell process has ALL error handling done within the power shell script, or you'll continue to run that risk.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 25, 2017 at 8:33 am
Thank you, yes you are right, and we do have some error handling in place already. The actual script is below - can you see any way in which we can improve this to avoid this scenario?
$CaptureOutput = ROBOCOPY "\\$SourceServer\$SharePath" "\\$MirrorServer\$SharePath" /S /MIR
# If an error occurred, throw the message.
$Isfailure = $CaptureOutput -match "ERROR "
If ($Isfailure)
{
Throw $CaptureOutput
}
August 25, 2017 at 8:45 am
The problem may be with ROBOCOPY. It was never designed to be run from an Agent Job, so it doesn't pay any attention to whether or not a console is actually available, and may not throw any kind of error, but just hang. No way to use error handling to capture a hung process. Of course, it may be going a tad far to try and automate validating permissions every time this runs, but the key to "no hang" is to NOT use anything that only has console output, such as ROBOCOPY. That may not be avoidable, but at least you'll understand that if permissions get changed, you may have a hang problem again.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 25, 2017 at 9:19 am
Yes I think you are right, we'll have to keep an eye on it! Many thanks for your assistance.
August 25, 2017 at 9:48 am
sgmunson - Friday, August 25, 2017 8:45 AMThe problem may be with ROBOCOPY. It was never designed to be run from an Agent Job, so it doesn't pay any attention to whether or not a console is actually available, and may not throw any kind of error, but just hang. No way to use error handling to capture a hung process. Of course, it may be going a tad far to try and automate validating permissions every time this runs, but the key to "no hang" is to NOT use anything that only has console output, such as ROBOCOPY. That may not be avoidable, but at least you'll understand that if permissions get changed, you may have a hang problem again.
In powershell you can start the process running and if it doesn't finish after a certain amount of time kill it and explicitly return an error code which should cause the agent job to error out 🙂
I've had to do that for certain jobs that will hang indefinitely without throwing an error.
August 25, 2017 at 11:50 am
ZZartin - Friday, August 25, 2017 9:48 AMsgmunson - Friday, August 25, 2017 8:45 AMThe problem may be with ROBOCOPY. It was never designed to be run from an Agent Job, so it doesn't pay any attention to whether or not a console is actually available, and may not throw any kind of error, but just hang. No way to use error handling to capture a hung process. Of course, it may be going a tad far to try and automate validating permissions every time this runs, but the key to "no hang" is to NOT use anything that only has console output, such as ROBOCOPY. That may not be avoidable, but at least you'll understand that if permissions get changed, you may have a hang problem again.In powershell you can start the process running and if it doesn't finish after a certain amount of time kill it and explicitly return an error code which should cause the agent job to error out 🙂
I've had to do that for certain jobs that will hang indefinitely without throwing an error.
How about an example?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 25, 2017 at 12:33 pm
sgmunson - Friday, August 25, 2017 11:50 AMZZartin - Friday, August 25, 2017 9:48 AMsgmunson - Friday, August 25, 2017 8:45 AMThe problem may be with ROBOCOPY. It was never designed to be run from an Agent Job, so it doesn't pay any attention to whether or not a console is actually available, and may not throw any kind of error, but just hang. No way to use error handling to capture a hung process. Of course, it may be going a tad far to try and automate validating permissions every time this runs, but the key to "no hang" is to NOT use anything that only has console output, such as ROBOCOPY. That may not be avoidable, but at least you'll understand that if permissions get changed, you may have a hang problem again.In powershell you can start the process running and if it doesn't finish after a certain amount of time kill it and explicitly return an error code which should cause the agent job to error out 🙂
I've had to do that for certain jobs that will hang indefinitely without throwing an error.
How about an example?
Sure, here's a simple one. Change the timeout values to see what happens it runs or timeouts. Also on no timeout you would still need to parse the errors to verify no error messages show up.
$job = Start-Job {Sleep -s 30}
If(Wait-Job $job -Timeout 40)
{
Write-Host "Job finished"
Get-Job -Name $job.Name -IncludeChildJob | Select-Object -Property *
}
Else
{
Write-Output "Job Timed Out"
Remove-Job -Force $job
$host.SetShouldExit(666)
}
August 25, 2017 at 12:45 pm
zoggling - Friday, August 25, 2017 8:33 AMThank you, yes you are right, and we do have some error handling in place already. The actual script is below - can you see any way in which we can improve this to avoid this scenario?$CaptureOutput = ROBOCOPY "\\$SourceServer\$SharePath" "\\$MirrorServer\$SharePath" /S /MIR
# If an error occurred, throw the message.
$Isfailure = $CaptureOutput -match "ERROR "
If ($Isfailure)
{
Throw $CaptureOutput
}
Please note: I'm not entirely sure how this will translate into a Powershell script!
That being said, I've made good use of Robocopy for various things through the years, and my first suggestion would be to have it log what it's doing via /LOG:{Path to your log file} The other thing I would do is change the number of retries and wait between retries of robocopy via /w: and /r: The defaults on those are 30seconds wait between retries and 1,000,000 retries...
So when it runs into a problem with a copy, it's going to take (/me does some math) ~347 DAYS :crazy: before it moves on (no wonder the job never finished, huh? :hehe:)
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply