Around 4am on March 28, 1979, things started to go badly wrong deep inside the bowels of reactor number two at Three Mile Island. Routine maintenance to clear a blockage inadvertently caused a pump shutdown which triggered a cascading series of failures in the critical coolant systems that kept the reactor core temperature stable.
The master control panel lit up like a Christmas tree. Lights flashed, horns blared. But the crucial information needed by the operators was lost in the noise. Chris Faust, one of the men on duty at the time said later “I would have liked to have thrown away the alarm panel. It wasn’t giving us any useful information”.
Too much information
Confused by conflicting indications from the control panel, operators made a series of bad decisions which exacerbated the problems. The reactor core, starved of vital coolant, started to overheat. Radioactive material began to vent into the outer protective enclosure.
Disaster was averted only when relief worker Brian Mehler realized that a critical valve needed to be closed, despite the panel erroneously indicating that this was already the case. At the eleventh hour, coolant finally flowed again through the overheated reactor core, preventing a total meltdown.
Although no-one was injured and no significant radioactive contamination occurred, the cleanup process was protracted and enormously expensive. The Three Mile Island ‘incident’ has become a classic in disaster case studies.
Software form triumphs over function
I’m reminded of Three Mile Island every time a security vendor proudly demonstrates their wares. They show how you can drill down through the call stack and see exactly what malicious DLL injected itself into an innocent process. Network traffic flows between little dots on a map of the world. A spiderweb of connections between servers pulses in real time as the packets flow between endpoints and servers. There are charts and flashing lights galore.
It looks terrific, like the War Room in Doctor Strangelove. Or the control panel for TMI-2, the doomed reactor. But the thing about these products, despite the flash and glitter, the lights, buttons and dials, is that they are not much use when things go terribly wrong. Worse, they’re often not much use before things go terribly wrong, either.
When bad stuff happens – as it did to shipping giant Maersk, who apparently lost every endpoint and server during the WannaCry/NotPetya attack last year, these products can only sound horns and flash lights. They don’t actually let you intervene and do anything.
Heroes are hard to find
In fact, Maersk had a hero rescuer like TMI’s Mehler. Apparently, an un-named member of the operations team manually shut down one of the organisation’s key domain controllers as the attack unfolded. This preserved Maersk’s vital Active Directory database. Consequently, the company were able to rebuild and recover far more quickly after the attack. Had that database not been preserved, it might have taken a lot longer than the heroic 10 days during which staff worked round the clock to reimage every single machine.
But we shouldn’t be relying on heroes to leap into action and save the day. When a submarine springs a leak, watertight doors can be closed automatically to ensure that flooding is contained. When an aircraft engine develops a fault, it can be shut down and any flames extinguished at the turn of a handle in the cockpit.
To avoid disaster in the first place, critical welds are inspected with X-Rays and neutron beams.
Yet when it comes to infrastructure security, we often don’t have any fire extinguishers, watertight doors or even neutron beams. Instead, our sophisticated, flashy tools simply show us the flames over CCTV, as, one by one, our endpoints go dark under attack. Unless one of your employees turns out to be a superhero in disguise, it’s unlikely Bruce Willis will turn up to douse the flames and trounce the villains.
The wrong tools for the job
Worse than this, our infrastructure management tools are often slow and cumbersome. Consequently, upgrades stretch over days or even weeks, and when you reach the end of the rollout, you have to start again with the next tranche of fixes. It’s a constant, reactive scramble to stay current.
As a result, deploying current patches even within a timescale of months (let alone days) has proven to be a challenge too far for many organisations (Equifax springs to mind here). Apart from the patch challenge, which is just being reactive and rolling with the punches, proactively looking for trouble is even more of a challenge.
For example, looking for the software equivalent of weak rivets or cracked welds (such as easily-guessed passwords or local administrator accounts sharing the same password across devices) is beyond the capabilities of most vendor security products. And even where these products do provide the capabilities to proactively look for trouble before it hits, their siloed nature means that communicating this information to the people within the organisation who are actually responsible for managing it is often difficult or impossible.
Turning on a dime
That’s why we need to rethink our defence paradigms. We need tools that allow you to fluidly build queries and actions that manage and patch endpoints and servers and deploy these in real time. Tools that let you partition your network or quarantine devices as quickly as possible, to prevent malware from pivoting outwards from the point of infection.
These tools need to allow you to easily build links to other critical in-house systems; for example, SecOps products from vendors such as ServiceNow, allowing security incidents to be managed consistently with other service incidents within the organisation.
By leveraging the powerful and flexible workflow capabilities of these products, key decision makers can always be ‘in the loop’ and empowered to act decisively and immediately when trouble strikes.
At the same time, critical alerts need to be fed into estate-wide monitoring systems such as Splunk, making sure that intervention isn’t stalled just because one critical person wasn’t at their desk.
So when you’re thinking about tools and technology, don’t be seduced by flashing lights. Impressive though it was, Kubrick’s War Room was just a movie set.
When it counts, being able to take decisive action to protect and defend yourself takes more than a few pretty screens. You need to be able to manage endpoints proactively – the equivalent of taking an X-Ray of critical welds to make sure there are no cracks in the steel of your perimeter defences.
And you need fire extinguishers and watertight doors. They might not be glamorous. But when you’re under attack, they might just save your life.