February 26, 2008 at 2:56 pm
Hello all, we have SQL server 2000 sp4. Our production DB is 250gb in size and in addition to our regular backup strategy we backup it up and restore it to a duplicate db called RS (for month end reporting purposes). Last week it took 1 hour to back it up and about 5 hours to restore it to RS. Last week we upgraded the memory in the server from 12gb (9gb was assigned to SQL) to 26GB (20gb assigned to SQL). The backup took 4 hours to complete and I canceled the restore as it hadn't completed after about 12 hours.
I have since set the SQL server memory to 15gb and re-kicked off the restore, I watched the 'File write bytes/sec' counter and it finished creating the new db files after about 5 hours. Now I am watching the 'device throughput bytes/sec' counter and it is averaging at about 2.5mb per second. Surely this is too slow!
Anyone have any ideas on what has caused this massive slow down?
My I.T department assure me they haven't changed anything on the SAN drive, where the files are being written to and restoring from.
Thanks in advance for your help
February 26, 2008 at 11:48 pm
is this a clustered server? Have you restarted sql server and/or the server the RAM was upgraded on?
February 27, 2008 at 2:25 am
Yes it is clustered and yes it has been restarted. Thanks.
February 27, 2008 at 3:18 am
Just had a call from our I.T dept, the problem is due to the write cache being disabled on the SAN due to it thinking that the batteries have expired. Nothing to do with the memory by the look of it.
Thanks
Dave
February 27, 2008 at 9:11 am
Have you got this one sorted out.
Cheers,
Sugeshkumar Rajendran
SQL Server MVP
http://sugeshkr.blogspot.com
February 28, 2008 at 3:06 am
All fixed now. Replaced the batteries on the 2 controllers, which re-enabled the write cache.
Thanks
Dave
February 28, 2008 at 5:00 am
I was going to suggest that it was just bad luck / coincidence that more RAM slowed things down 🙂 If you weren't lucky enough to see it was your SAN then I'd suggest things such as checking your RAM for faults, looking at performance counters, restricting SQL to less RAM whilst leaving it all installed, etc... Good to see it was something relatively easy and made sense in the end!
February 29, 2008 at 6:34 am
I am surprised the SAN team was not aware of the cache failure. Regardless you may want to look into the configuration on the controller / LUN and consider how the cache is handled. We had done research that indicated that depending on the configuration SQL server will think the data has been written when it hands off the data to the cache and the SQL latch will release. If you are having cache failure problems you could experience data loss.
On a side note you may want to consider looking into split file backups and or SQL lite speed. I had a very similar environment (Three way cluster with all three nodes having active partitions. Each node had eight procs and sixteen gig RAM. Each of the partitions had databases exceeding 200 gig.) Each of the databases took about an hour to an hour twenty to backup. We started using split file backups and cut the time by a third. (I suggest not exceeding a file per proc allotted to SQL) I then started testing SQL lite speed. SQL Lite Speed testing proved to reduce backup time from 1 to 1 1/2 hours down to just over 20 minutes and restores were reduce down to just over an hour. In addition to the time savings the backups were stored in a compressed format. The compression reduced the backup files size to about half of the original. Oh and the tool comes with a freely distributable decompression program that can be down loaded from their web site at any time. Just incase you have an issue where someone needs to have the backup decompressed and does not own the tool.
You may be a little leery to try such a tool however at the time I was working for the third largest Bank in the US and it passed all of our testing with flying colors.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply