July 11, 2011 at 3:31 am
Hi,
Is there anyway to increase the amount of data sent in the log shipping trn file so that the secondary database can have the latest data quicker or create a dynamic schedule that copies and restores quicker between certain hours? What keeps happening is that every night I stop the log shipping process for 2 hours so that I can take important backups of the database and this leaves the database 2 hours behind the primary?
Can anyone help?
Many Thanks
July 11, 2011 at 3:51 am
July 11, 2011 at 4:15 am
It is an awkward situation... every night I do a full backup of the database whilst the log shipping continues but every night the log shipping writes one bad trn to the SAN and i get an event log error message in my inbox saying that the error has occured. Meaning that the data is unusable and can not be replicated on to the secondary database. So I have to manually push the log shipping process forward passed the point where it tried to last restore which is the corrupt data. I can't simply just rebuild the database server so that it is a more powerful server I have to find a way around either sharing the resources more appropriately or come up with a method to find out what is happening in the database at the time the backups take place. Which is something I am currently working toward finding out also. I have so far replaced the SAN so that it is solely dedicated to the primary database backup process and also used another NIC into a different subnet to try to lessen the node of network congestion that may be causing timeouts and data loss on the SAN. With no luck. So it is time to timestamp the server at that time in the morning for both the MSSQL processes and the server as a whole. Any suggestions?
Regards
Thanks
July 11, 2011 at 5:18 am
nathanr 81822 (7/11/2011)
but every night the log shipping writes one bad trn to the SAN and i get an event log error message in my inbox saying that the error has occured.
what is the exact error you are getting?
Log shipping is very stable and probably the easiest way of replicating a database in SQL Server!
There are a set of stored procedures you can use to modify the log shipping plan on both the primary and the secondary. See this link for more info
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
July 11, 2011 at 5:52 am
I have investigated the errors quite thoroughly however I am not in a position to vouch for the 100% accuracy of the errors. Although here they are;
BackupIoRequest::ReportIoError: write failure on backup device '\\192.168.1.100\db1\db1_20110711013021.trn'. Operating system error 64(failed to retrieve text for this error. Reason: 15105).
BACKUP failed to complete the command BACKUP LOG db1. Check the backup application log for detailed messages.
The operating system returned the error '64(failed to retrieve text for this error. Reason: 15105)' while attempting 'FlushFileBuffers' on '\\192.168.1.100\db1\db1_20110711013021.trn'.
The operating system returned the error '64(failed to retrieve text for this error. Reason: 15105)' while attempting 'SetEndOfFile' on '\\192.168.1.100\db1\db1_20110711013021.trn'.
I have used 'net helpmsg 64' to at least narrow the problem down which shows me the network name is no longer available.
BUT like I said earlier I can not guarantee that the cause of the problem is the network as I have tried somewhat to seperate some of the load of the network to see if makes a difference and it hasn't... so I either need to to segregate the backup strategy even further or isolate any load that may be occuring when the backup process copies the bak files off to another server.
July 11, 2011 at 7:45 am
nathanr 81822 (7/11/2011)
or isolate any load that may be occuring when the backup process copies the bak files off to another server.
exactly what do you mean by this? What is the backup job doing? The backup job is created by the LS wizard and will backup to your chosen location.
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
July 11, 2011 at 7:54 am
If you can't understand the above errors from a log shipping process I don't think you can help solve my problem (Sorry) ...so back to the original question... is there anyway to force a size of data to be sent/backed up if the two databases are out of sync?
Regards
July 11, 2011 at 1:56 pm
i'm asking because the log copy operation is taken care of by a job on the secondary server. I understand log shipping perfectly but do not understand why you say
nathanr 81822 (7/11/2011)
when the backup process copies the bak files off to another server.
Since you have it all in hand i'll leave it with you
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
July 13, 2011 at 8:58 am
I have two production servers, prod a has the primary database, prod b has the secondary. The agent on prod a backups the logs from prod a to SAN A, then the agents on prod b collect the logs from SAN A and copies the trn to the local disk on prod b and then restore the trn on prod b.
simples!
July 17, 2011 at 6:11 pm
I configured SQL log shipping to transfer copies of the primary DB's transactional logs to the secondary DB server which is on a disaster recovery [DR] subnet and scheduled to run hourly via SQL Agent Jobs.
There's always going to be some amount of lag time between primary DB and secondary DB as for log shipping; the [TRN] log files are restored after being copied over from the primary server (you said prod a and prod b in your case--two prod but using log shipping).
I'm not certain that true replication or mirroring of MS SQL DBs (without a cluster) is possible with log shipping for production purposes (at least in cases I've been involved). I guess if you were just reading data from the DBs and the secondary DB being off from the primary DB by a few minutes isn't detrimental to your needs---then it would work.
Are you sure you're using the best SQL replication/backup method for your particular needs??
How far a part are you LSBackup primary and LSRestore secondary jobs scheduled??
I just configured 5 different DBs from the same version (and edition) of SQL server 2005 in my case--from a physical to virtual server in a separate DR subnet.
Quick Research on the errors posted:
I see that you're using a SAN and presume your statements suggest you double and triple checked where your TRN backup job dumps to folders and folder shares??
I assume you have checked the share level permission and the folder level NT permissions of the network shared SAN folder location for the SQL Agent credentials to have access??
Please tell me the version (and edition) of MS SQL Server and server OS' that you're running on the primary LS location as well as the secondary location. Please also confirm the service pack level on each SQL instance (and OS server)??
Check each server's OS event viewer (application and system log) for relevant error or warning messages--ensure nothing is being overlooked. Research any unusual (or obvious) looking error or warnings in detail.
Please also tell me if the primary and secondary log shipping servers are in the same subnet or if network routers are between--as well as the SAN device.
Do you know the speed of your network (or fibre channel ) interfaces on all associated devices, do they match, do they all auto-negotiate speed?? Setting all configurable interfaces to the same speed manually may offer some help?? This would help identify interface problem, and/or protocols such as offloading and interfering with its integrity at the data packet level.
Are there any firmware updates available for any of the associated devices relevant to the configuration (i.e SAN, PrimServer, SecServer, etc.)??
Another edit: As for you question "is there anyway to force a size of data to be sent/backed up if the two databases are out of sync?"---Perhaps googling this edit and checking MS.com website for this fix to start:::
IT WASN'T 100% CLEAR TO ME FROM WHAT I READ WHICH SERVER, ETC. TO APPLY THIS REGISTRY CHANGE THOUGH--BE SURE TO UNDERSTAND IT BETTER THAN ME 1st THOUGH::
open regedit
navigate to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanworkstation\parameters
Create a New DWORD value with the name: SessTimeout
set the value: 360 keep it Hexadecimal
(This value might not work for your backup but it was high enough for mine. If this doesn't work increase the value and try again.)
P
July 18, 2011 at 3:20 am
pjmahoneyit (7/17/2011)
I use SQL log shipping for disaster recovery to a DB server in our DR subnet. I also use log shipping to have an hour old report server DB for reporting purposes and prod server performance. There will always be a lag time between primary and secondary as for log shipping; the TRN files are restored after being copied over from the primary (you said prod a and prod b in your case--two prod but using log shipping). Are you sure you're using the best SQL replication method for your particular needs?? - I hope soI just configured 5 different DBs from the same version of SQL server 2005 in my case--from physical to virtual in our DR subnet.
Quick Research on the errors posted:
You haven't pointed your log shipping backup job on the primary log shipping server to backup to a network share location on a different subnet have you?? I assume you have checked the share level permission and the folder level NT permissions of the network shared SAN folder for the SQL Agent credentials to have access?? (edit:: I see that you're using a SAN and presume your statements suggest you double and triple checked where your TRN backup job dumps to disk??) - No I have not, the backups occur on the same subnet and I know exactly where the trn files go to on the SAN.
Please tell me the version of MS SQL Server you're running on the primary LS location as well as the secondary location. Please also confirm the service pack level on each SQL instance and send that along too. On the log shipping Primary I have a 10.0.1600.22 (2008 RTM) on the secondary I have a 10.0.1600.22 (2008 SP1)
Please also tell me if the primary and secondary log shipping servers are in the same subnet or if network routers are between--as well as the SAN device. Do you know the speed of your network interfaces or fiber interfaces on all associated devices, do they match, do they all auto-negotiate speed, does it still occur if you set all to the same speed manually?? This would rule out network problems at the device interface level. Check each server OS related for each component for other event relevant (system and application log)--double check nothing is being overlooked. I have checked and am pretty sure about the errors, and they occur only on the log shipping primary. As for the network, its a 100MB network with no autonegotiation taking place. BUT I will double check.
Another edit: As for you question "is there anyway to force a size of data to be sent/backed up if the two databases are out of sync?"---Perhaps googling this edit and checking MS.com website for this fix to start:::
IT WASN'T 100% CLEAR TO ME FROM WHAT I READ WHICH SERVER, ETC. TO APPLY THIS REGISTRY CHANGE THOUGH--BE SURE TO UNDERSTAND IT BETTER THAN ME 1st THOUGH:: I will apply this change to the log shipping primary and see if I can get around any read/write issues in the first instance. I never have a problem restoring so no READ timeout issues occuring.
open regedit
navigate to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanworkstation\parameters
Create a New DWORD value with the name: SessTimeout
set the value: 360 keep it Hexadecimal
(This value might not work for your backup but it was high enough for mine. If this doesn't work increase the value and try again.)
P
Thanks for all the help.
Its nice to wake up monday morning and have all your problems solved, or at least started before you started them.
Thanks
July 18, 2011 at 8:08 am
I also read that this could be memory related where the OS needs more RAM allocated to it. And also people seem to get this error upon the verification step of their BAK files. I assume you're getting this on just the TRN file log backup job though.
Please let me know what you're able to determine and if you get any further details from any other level of logs, etc.
Thanks,
P
July 23, 2011 at 1:05 am
If you were able to resolve this, please update me on the fix (or steps to) whenever you get a moment. Otherwise, I assume you've been McGuyer'ing it along and perhaps my editing original response isn't as choppy as it was originally, and you could update me on that (or take a look if it may help).
FYI... Not that I need to look for additional work but I am a consultant too and have experience in remotely connection support. Regardless, maybe my written communications as time permits will assist.
Thanks,
PJ
July 29, 2011 at 9:14 am
Hi PJ,
I am still getting the errors.
A silly question... Do I need to reboot the server that has had the SMB Session Timeout fix applied to it for it to work, how can I prove it??
I have been here, with windows that is, a dozen or so times and do not know if this particular key change takes affect immediately or requires a reboot.
As for an update, I had changed the timeout on the sesstimeout key to somewhere up to 360 seconds, or 6 minutes. With perhaps a little less conincidental impact on the service, but as it hasn't completely worked I have since upped it to 10 minutes to see if there is any resolve.
After taking into consideration the statement about memory I will pursue it a little more in depth over the next week or so.
Thanks again.
July 30, 2011 at 8:07 pm
You never know really unless the MS KB article says or not and that that article is relevant to your case. Sometimes rebooting services will do the trick or rebooting the entire Server OS.
I'm having a hard time going back and seeing the original article--from my first response where you replied back what I'm seeing now. I definitely don't know all related factors in your environments to give you 100% accurate presumptions. I was just shooting to help you with a quick fix potentially. I guess using robocopy could be a quick fix too and schedule with SQL Agent cmdexec jobs, or Windows scripts (e.g. batch).
I typically start by checking relevant logs and follow potential common paths when tech support isn't available or too time consuming I guess sometimes.
Have you figured this out yet??
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply