May 13, 2010 at 9:34 am
Sysinternals is also great. Perfmon, if you haven't used it before, can be a litttle intimidating because there are counters for everything.
http://msdn.microsoft.com/en-us/library/aa645516(VS.71).aspx
Brent Ozar had a good article that he occasionally updates.
http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
The tools that come with the HBA (The fiber network card for the san) will be specific to your setup, that's why I tried to refer you to the person that manages your SAN. The san will also have it's own management tools.
I also like the idea of breaking up the job into individual steps, that will help to issolate exactly where the problem it. I haven't used Robocopy but I've heard good things about it. I don't think the problem is with the move command but with something going on with your hardware.
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
May 13, 2010 at 9:55 am
The counter you want to initially took at in perfmon is LogicalDisk then choose something like "Disk Read Bytes/sec" and select the D:. If the move is really moving the file you will see a high amount of activity. If it is hung and waiting for something else then it will see the normal amount of activity for that drive. One problem with this approach is that you probably haven't done any baselining so you may not know what normal activity is. If this drive is only used for backups and you see no read activity then my method will work for you.
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
May 13, 2010 at 10:13 am
Awesome guys. I'll let you know the outcome.
May 13, 2010 at 10:34 am
You might also just do a couple quick tests - manually copy the same file to SAN, another local disk, and a network drive and see how the times compare.
At least you would be able get some baselilnes to compare.
It does sound like something is waiting for a message that is getting lost.
Breaking it apart like Steve suggested and if you can manually / interactivily run each step might help see something.
I can recall doing things in batch files sometimes having to play with waits to make sure one step is done before the next one launches. Although it ran fine before.....
When you talk patches, are you just referring to SQL patches, OS patches, or both?
I've run into windows updates that were SQL server updates that they installed by mistake.
Greg E
May 13, 2010 at 11:09 am
You're right it does sound like something is waiting for some human interaction like a batch file "press any key to continue" Your idea of logging the job sould capture that.
I was referring to the firmware on the HBA cards, on the SAN controller down to the individual firmware on the drives. At a previous company I worked at part of my responsibilty was managing assisting the Person who managed the SAN. The SAN manufacturer had a software upgrade that caused a lot of havoc. Weird things happened afterwords even after we'd stepped down the upgrade. Eventually that manufacturer removed that version of firmware from their support website, waited 6 months then tried it again with a new version number and most of the bugs worked out.
There used to be a blog out there of else someone who went through what I just described but I can't find it now but it described the hell we went through.
Another thing to ask the SAN person is if they have done anything like convert the LUN from RAID 10 to RAID 5 or split it up/combine it or some other thing. As the SAN converts things in the background this can cause things to slow down greatly.
I was surprised in my testing, when I had the access to the tools on our SAN, that the differences between Raid 10 and raid 5 wasn't as great as I expected. Brent Ozar has another good article on using SQLIO to clock the SAN.
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
May 13, 2010 at 12:47 pm
I've forwarded everything to the local admin. We're waiting on a test window to keep going. So far everything seems ok (we've moved backups earlier in the night an monitor frist thing in the morning so we should be fine in case of another problem).
Thanks you all and again. I'll keep you posted once we have the answer.
P.S. I like the idea of something waiting for a confirmation, but I don't see how that would keep the san trashed to a point where it's basically useless... you're either working on something or waiting to keep going. The in between is most likely a bug somewhere.
May 14, 2010 at 6:35 am
Ninja's_RGR'us (5/13/2010)
I've forwarded everything to the local admin. We're waiting on a test window to keep going. So far everything seems ok (we've moved backups earlier in the night an monitor frist thing in the morning so we should be fine in case of another problem).Thanks you all and again. I'll keep you posted once we have the answer.
P.S. I like the idea of something waiting for a confirmation, but I don't see how that would keep the san trashed to a point where it's basically useless... you're either working on something or waiting to keep going. The in between is most likely a bug somewhere.
While the process is hanging, can you copy a file locally, to the network, and to the SAN?
That might tell you something.
Usually SAN's are shared, so I'd expect someone else would notice issues at the same time.
We had an intermittent issue once where we found a bad CAT 5 cable.
So if it works sometimes, then seems to stop, might be worth having someone check that the lights are steady.
Greg E
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply