We Stink!

  • Comments posted to this topic are about the item We Stink!

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Nope... not "WE"... "THEY".

    THEY have gotten arrogant... every damned one of them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Everything @Grant says is valid.

    Let me add this tentative diagnosis: It is very likely that the CrowdStrike software routinely gets installed and updated in Kernel Mode instead of User Mode. If so, that is likely a mutual CrowdStrike and Microsoft choice.

    CS: “This should be treated as part of the heart of Windows and put in Kernel Mode.”

    MS: “Yeah, security is so important to protect against hacking and disaster. I’ll tell our guys to help your guys with the special Kernel-Mode install tool.”

    MORE INFORMATION

    Dave Cutler and his team created WNT (Windows New Technology), first  released as Windows 95 Server in August of 1995. Every subsequent Windows Server and every Windows PC from Windows XP onward are built on that same WNT architecture.

    One key aspect is its pair of modes: Kernel Mode and User Mode. One key benefit of that that splitting into two modes is to almost completely eliminate the blue screen of death.

    Most Microsoft programs/applications, including but not limited to, Microsoft 365, are installed and run in User Mode. Likewise, all end-user-created and 3rd party software should be installed in User Mode. When one of them fails, it fails – by itself. Windows itself keeps running, along with most everything else.

  • Very well put!

    What I just don't get, is how something this catastrophic made it through any testing at all.

  • Not well tested and not deployed in rings. Basic failures of software development.

  • MS blamed EU. When Defender was created as their AV protection running in kernel mode, EU said it locked out competition. That in turn seems to have opened Pandora's box.

  • We suck as "we" could have prevented this...

    Historically, IT never install the latest patches as soon as they came out and wait for a time period or another release or two before applying to critical business systems. Many still follow this practice with Windows and SQL patching, but appears, most of the planet has given up this practice for security software and placed 100% confidence in another vendor's QA testing process.

    CrowdStrike had a new patch delivered within 90 minutes, so global impact would have been dramatically less if everyone utilized the settings they already provided.

    CrowdStrike RCA it states

    "Customers then have the option of selecting which parts of their fleet should install the latest sensor release (‘N’), or one version older (‘N-1’) or two versions older (‘N-2’) through Sensor Update Policies."

    https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

     

     

  • Alphonse wrote:

    Very well put!

    What I just don't get, is how something this catastrophic made it through any testing at all.

     

    When some lower/entry level employee said something along the lines of "This is risky we should do X instead because of Y happening" their manager replied "That'll never happen" and proceed to ignore the employee.

    Tell me none of you have received the "That'll never happen" excuse from management

    Kindest Regards,

    Just say No to Facebook!
  • In the UK a house has to be built in compliance with building regulations. Other structures are covered by regulations and strict codes of conduct.

    My son is an engineer maintaining railway carriages and equipment. There are strict tolerances and technical requirements znd regulations for things we, as passengers, don't even know exists!

    History tells us that buildings and bridges failed and regulations came about after some awful tragedies.

    In software what is there apart from conscience and personal pride?

    The improvements in tooling, especially the things we can test through mechanical means, has improved massively. No matter how fabulous the tooling we need both the will and knowledge to use them. We also need accountability of individuals for their lack abd/or corner cutting.

  • Grant Fritchey wrote:

    Comments posted to this topic are about the item We Stink!

    The below is in reference ONLY to the engineering of the software for use by personal PC that eventually grew to far more.

    Unlike real world engineering what was the risk of engineering failure in software, the kind built for personal PC use, in the early days?  If you failed in real world engineering people could get hurt or killed. In the digital realm this was not the case in those early days. This code grew and grew and grew but the mindset persisted or at least it did for the decision makers who decided to keep building the same way on the same base vs restarting and doing it differently.

    There are plenty of software engineers who see the problems and want to change but it's the decision makers, not always with an engineering background, who make the calls and that mindset is more often "Why spend so much when things are fine?".  They weigh the risk and cost and until something serious forces them to do otherwise they will always choose to not spend the costs to change.

    Kindest Regards,

    Just say No to Facebook!
  • David.Poole wrote:

    History tells us that buildings and bridges failed and regulations came about after some awful tragedies.

    In software what is there apart from conscience and personal pride?

    .

     

    Because we've foolishly allowed ourselves to become too dependent on something not built to the reliability needed for the kind of dependency we've created.  Software failures can cost lives if the right system's are impacted.  When I read that some hospitals couldn't access records because they were in the cloud (not sync'd between local and the cloud) and my first thought was those damned bean counters making such bad decisions.  No hospital or similar mission critical system should be so dependent on internet connectivity, the CLOUD,  that they can't access records on patients.

    And unless the right people either pay a significant fine and or go to jail this mindset will not change.

    Kindest Regards,

    Just say No to Facebook!
  • Ok... talk about stink!   Fortunately, they decided not to chance using a faulty space craft (IMHO, it was faulty even at lift off) but check this out.

    https://weather.com/science/space/video/astronauts-stranded-in-space-with-no-return-set

    https://duckduckgo.com/?q=astronauts+stuck+in+space+2024+boeing

    They should jettison the faulty craft  to head for the Sun or an "Earth Burn" over one of the oceans and get someone that knows how to build such things to get them back home.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You have a point there, Grant!

    If I may quote myself: "The Windows folks should quietly sit in a corner until they manage to prevent that any stupid driver can take the whole OS down."

    As an OS "end-user" I don't care what module runs in which mode. Just make it so that none of them can kill my whole system. We don't need more features and more speed all the time if the trade-in is system health.

    I am actually amazed that just about everyone seems to accept that those OSs (which are only the base of everything that happens on your computer 🙄) were never thought through or developed until being ready for the masses, given the number of security patches coming out every frickin' week.

  • Back when I did Uni, I distinctively recall the lecturer illustrating that as code developers we are the engineers and we should adhere to those practices that deliver a solid result.  They compared software engineering to building engineering.   Years of learning, physics etc come together to provide that building that doesn't fall down, can withstand sever weather events and so on as long as those standards and quality assurance are done.

    Yet, the majority of dev, DevOps have been digressing and the standards of good software engineering have been sliding away because of perhaps laziness or the perceived notion that it's not required anymore, or doesn't fit the current build & deploy frequently mantra.   If frustrates me when I see it, because it should not happen.  When I speak up, I get shouted down.   And it's no point fighting when  there is another objective, usually; just get it out there, can't afford any more time or $, we can just push out a fix.

    CS initial response - was oh dear, here's the fix!!!  Was completely devoid of concept that their practices had the potential to cause widespread outages.  And I also blame MS to a certain degree for giving them the leeway to allow it to happen in the first place - lack of OS resilience.

    Good software testing should trap the majority of issues. Good testers and engineers should have thought of the scenarios that might occur.   This sounded like a pretty simple issue conceptualise and to build a test around:  Deploy to an external honey pot that has the range of software that is supported.  Test and verify the result - ok, continue to release using exactly the same deploy, no copying it elsewhere, exactly the same.

    Was it a money saving exercise that they did they not have enough people who could think outside the box?

    Do we have to get better - totally - but perhaps we should look back at the past and take some of the learnings we seem to have conveniently decided to let slip.  The questions and the solutions have already been answered a long time ago.  Lost because they were no longer fashionable.

     

  • For much of my career the IT department has been a slave with many masters. IT had no representation in senior management. The best you could hopeful was being an unappreciated and neglected vassal to Mr Longpockets as head of finance.

    A ex-boss came up with the phrase "learned helplessness". In fairness, IT seemed to be home to  people who could politely be called introverts.

    Learning to say NO is a valuable soft skill and one I feel came very (possibly too) late for many in IT. If an IT manager said no they would be faced with "my Dad's bigger than your Dad" from those who did have managers at senior level. Everyone comes under pressure to deliver faster and cheaper. Historically, I think IT has been accommodating to the point where it is its own worst enemy

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply