October 26, 2006 at 7:20 pm
We have a cluster (third party sw) that has SQL Server on it. Apparently when the cluster was built the names of the nodes were changed (i.e., the original name where SS was installed was not kept on one node)... so, now, we have two nodes named DB23 and DB24... and the result from @@servername is DB2 (the former name before the cluster was built).
My question? What should the servername be? Does the name have to change when we failover from one node to the other? I know the procedure for changing names; but if anybody has a script to handle this, that would also be greatly appreciated.
Jobs created on DB23 still ran on DB24 until they were stopped, and then they could not be restarted. Now, how they got started when the failover occurred, I'm not sure.
There were also something like 38 jobs on there from DB2, and they were all failed. Additionally, they could not be edited, disabled or deleted until I changed the sysjobs..originating_server entry to "DB23" even though they were currently on DB24.
This is a production machine and I am without backups as of noon today. Any ideas will be eagerly considered.
Thanks,
David
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 27, 2006 at 6:10 am
Is this an active/active or active/passive cluster?
You should connect to the virtual server - select @@servername will give the virtual servername.( or go by ip )
sounds like you have a mess! As far as I know ( and I've tried ) the only way to change a cluster name is to rebuild it. Everything should belong to the virtual server so the actual node becomes irrelevent, that's the idea of a cluster, the jobs belong to the virtual server. I sort of find it difficult to understand how you've got to where you are?
As for no backups - just do an ad-hoc backup or create another job - script your jobs and recreate them - this sort of mess does not provide an excuse for no backups !!!
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
October 27, 2006 at 11:07 am
Hello Colin,
Thanks for responding. I read your posts regularly and appreciate the feedback. This cluster is created with third party software. We are also running SS Standard Edition (and I am not sure if it is "cluster aware"... doesn't seem like it). I do not know of any virtual server - the name returned by select @@servername is "DB2" which WAS the servername before this was clustered. It is active/passive. The IP may be the solution, sounds like it is... There is a virtual IP that the web application uses to connect to the DB... but I don't know how to make this IP a part of packages, maintenance plans and jobs. The choice for anything other than "local" is grayed out. Only one instance is running at a time... perhaps the mistake is in having them named db23 and db24... they should both be named the same? <maybe retorical question>
at the moment all jobs are not functioning. They say they are enabled, and they have not failed; but they do not run, the individual steps can be pasted into query analyzer and run and they produce results; but they don't run through the scheduler... and the status is not updated.
Turns out we have a second layer of defense in backups, we are using Veritas agents and those files are backed up elsewhere, so backup isn't the current issue.
Seems like we have to rename the servername... and there are SPs for doing that... something like "sp_dropserver" and "sp_addserver" (not the names, just examples)... but there is a defined procedure for doing that... tweaking the system tables directly is also an option if we can get the tweak right... This is a production machine was many users... so it's risky.
Okay, I need to go off and think. Thanks again for your response. Looking forward to more replies... yes, it's a mess
David
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 27, 2006 at 1:00 pm
More info... technically we do not have a cluster, we are doing "volumn replication"... Restarting the service cleared the problem enough to allow jobs to execute. There is still no "history" on the jobs, and the date and status of last run do not appear in EM.
I have reverse engineered a job and there is no reference in it to the server ID except as a comparison to see if it was created on an MSX server. We overcame how to delete those jobs yesterday, so it looks like we don't have to rename the servername, i.e., from db23 to db24 on failover... at least not for the benefit of the jobs.
Our web application points to a virtual server by IP address, and the IP address points to the current volumn (node?)... but, is there any reason now for other changes?
We will fail back over to the original node on Tuesday night during a maintenance window. It is not an automated operation... i.e., requires operator intervention; but if our problems are fixed there, then we've got something to go on... I guess.
I would apprecate additional comments. Thanks for your help.
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 30, 2006 at 5:45 am
ummmmm....... OK so clusters are managed using cluster manager, from there you can find out your info. I don't know of many good books on clustering - I have one - windows 2000 cluster server guidebook isbn 0-13-028469-6 but I bought it from the US. I administer clusters without too much problems.
so .. if it is a cluster go to cluster administrator and select the properties of the sql IP address, this should also tell you the virtual server name, this is the IP address of the virtual SQL Server - connect to that and extract the @@servername, in the properties ( of the sql ip address ) you'll also be able to see the possible owners ( node names )
Hopefully that'll clear up one thing or another!!
Jobs - they have to be enabled and the schedule has to be enabled too. If there's a lack of job history check the agent properties, I expect the hold entries at 1000 job history is checked. Running jobs should appear in the agent and/or server logs.
Jobs should belong to the virtual server. I've no wish to be held repsonsible for my next comment but I wouldn't trust a non-sql backup at all. Unless your sql server is being shut down prior to backup I doubt they'd be much good. I'd ask for a restore of a single database backup onto a test box , oh and ask for a restore of master into a user database called oldmaster on a test box. If your backups can't produce this then worry!!!
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
October 30, 2006 at 11:06 am
Thanks again for your response. I do appreciate the chance to talk about this with someone. The software being used is from EMC and it's an active/passive arrangment. The passive side has services running; but the database cannot be accessed until the failover... and the failover is a manual operation. I've got the virtual IP the web apps use; but this does not appear to be the same. I manually updated the sys<something> table with "DB23" for server name and one other field... and it "computed" the third field with the same value. The backup set table correctly shows servier name "db23" and machine name "DB24" (in the failed over state).
My maintenance plans are working now; but there is still no history and no refreshing of the status... the underlying tables in msdb have the appropriate information, it's just the GUI that is not showing this (and yes, I have right clicked and hit refresh job). And yes, both maintenance plans and jobs are enabled.
My feeling on backups is that you don't have one until it is tested. My experience has been something like 60-70% bad backups of those untested… and because of this I have at times created test environments with 2.4 TB in order to restore (an oracle) backup.
I have used Veritas backups before... they are not raw, OS level Backups, they have an agent that peeks in the DB and captures the necessary information, and they are reliable... but, again, only to be trusted once tested.
I am concerned that we are working on using replication as a disaster recovery component, and somewhere in my studies I have found that "all backups, OEM and 3rd party... must be disabled... in order to do replication." Seems it has something to do with the logs being manipulated by the backups in such a way the log-shipping, and/or replication can't deal with them. Okay, that's another story.
I would like to have "history" and "last status" show in the GUI... if for no other reason than to "feel" like this problem is solved; but the plans and jobs are working, so that is of some consequence.
Thanks again.
David
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
October 31, 2006 at 4:16 am
I have a proc I use to return job data including last run, it uses three functions too otherwise I'd just post it. send me a private message with an email link and I'll mail you the code.
make sure you've service packed your client tools if you're using a remote connection to the sql server .. mis matched sp's can cause some strange effects!! I'd suggest you apply sp4 and 2187 rollup to your client pc ( if you're using one )
I usually TS to a cluster for admin. SAN's are another story and there's numerous reports of san replication not actually working in real world. I did some extensive rersearch on SAN's for DR and BCP with SQL Server ( I'm also SNIA certified - http://www.snia.org ) it's a whole can of worms most of which you'll never know if it works until you have a disaster, by which time it's too late!!
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 14, 2008 at 12:31 pm
I just wanted to say thank you for having contributed to this thread almost 18 months ago. I am now in another environment where I am the Oracle DBA with some shared SQL Server responsibilities and we have a Veritas cluster. I have only been here a week; but, it appears that some of the problems from 2006 are present here in 2008, and I appreciate very much being able to refresh my memory through this thread and your help at that time.
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply