September 11, 2019 at 8:58 am
I don't think that you can get a 1 minute diagnosis on a newish system.
Continuous improvement and engineering excellence is the way to get there. The cycle I would expect to go through would look something like that below
Even the best of us get surprised by the way things manage to go wrong in unanticipated ways. The important thing is to do the root cause analysis and feed that into the 4 step process above.
In my experience continuous improvement naturally leads to refactoring and simplification. This makes systems less likely to go wrong in the first place and much quicker to diagnose when they do.
There is a lot to learn from The Clean Coder by Robert C Martin
September 11, 2019 at 1:32 pm
The issue of HADR used to be difficult to set up in previous versions of SQL Server and Windows. However the multitude of new HADR configuration set ups are always improving and made easier to administrate not just on premise but also in the cloud. Further HADR topologies combined with integration and migration into non Microsoft HADR solutions are pushing and pushing better and better designs whilst definitely minimising/automating administration burden therefore maximising up time. In short 5 9's are more than achievable on today's HADR Eco Systems by diverting high administration costs into automated management and monitoring.
September 11, 2019 at 1:36 pm
I'm unconvinced whether AOG and other replication or scale out technology increase or decrease database availability. Designing the applications (and monitoring) to be more fault tolerant can increase system availability in terms of end user perception and uptime reporting.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
September 11, 2019 at 2:51 pm
I don't think that you can get a 1 minute diagnosis on a newish system.
Even the best of us get surprised by the way things manage to go wrong in unanticipated ways.
Agreed , but I have come across a few scenarios where a DBA has advised a developer that "this is a huge mistake waiting to happen" and been overruled.
On these occasions you have your monitoring in place and can prove the issue in minutes (hopefully a good dba would also have the rollback plan ready to go)
I'm running a server consolidation project at the minute with lots of linked servers involved... there's no way it's going live without every scenario I can conceive being tested and lots of scripts ready to protect us....
fingers crossed we can respond in 1 minute (but I think I just jinxed us)
MVDBA
September 11, 2019 at 2:54 pm
HA/DR can lower uptime with complexity. Loose coupling and independence can help. We could argue that spending time on a broken secondary v a broken primary might increase downtime, but in most situations I think uptime is as high or higher.
September 11, 2019 at 3:01 pm
I'm unconvinced whether AOG and other replication or scale out technology increase or decrease database availability. Designing the applications (and monitoring) to be more fault tolerant can increase system availability in terms of end user perception and uptime reporting.
To my mind data availability is as important as database availability that is important. Obviously your DB has to be up and running but if your scaled out DB has just replicated a delete without a where clause then you are just as stuffed as if the scaled out cluster went down.
I know that a lot of people are sadder and wiser for having experienced the horrors of BASE rather than ACID
September 13, 2019 at 1:12 pm
if your scaled out DB has just replicated a delete without a where clause then you are just as stuffed as if the scaled out cluster went down.
4 words that need to be used in any delete situation
begin tran
rollback tran or (commit if your rowcount is good)
I never put a commit tran in a script until I know my where clause is good. I keep a special cupboard where we lock the naughty developers who forget this 🙂
MVDBA
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply