Once a week.
Subscribe to our weekly newsletter.
Meltdown: Why our systems fail and what we can do about it
Today, we are in the golden age of meltdowns. More and more of our systems are in the danger zone, but our ability to manage them hasn’t quite caught up.
Ceasar Medina died because of a computer glitch.
Though he was shot in a botched robbery attempt, his killer—a convicted felon named Jeremiah Smith—should have been behind bars at the time. But Smith was one of thousands of inmates that the Washington State Department of Corrections accidentally released because of a software problem: a bug in the DOC’s computer code that, for over a decade, miscalculated prisoner sentences.
Surprising meltdowns like the one at the DOC happen all the time. At UCSF—one of the world’s best hospitals—a sophisticated pharmacy robot and a high-tech prescription system confused a doctor, lulled a pharmacist into approving a massive overdose of a routine antibiotic, and automatically packaged 38 pills, instead of the single pill the doctor intended. A nurse, comforted by the barcode scanner that confirmed the dosage, gave the pills one by one to her patient, a 16-year-old boy, who nearly died as a result.
In 2012, Wall Street giant Knight Capital unintentionally traded billions of dollars of stock and lost nearly $500 million in just half an hour because of a software glitch. It was a stunning meltdown that couldn’t have happened a decade earlier, when humans still controlled trading.
And at the airlines, technological glitches, combined with ordinary human mistakes, have caused outages in reservation and ticketing systems, grounded thousands of flights, and accidentally given pilots vacation during the busy holiday season. These issues cost the airlines hundreds of millions of dollars and delayed nearly a million passengers.
To understand why these kinds of failures keep happening, we turn to an unexpected source: a 93-year-old sociologist named Charles Perrow. After the Three Mile Island nuclear meltdown in 1979, Perrow became interested in how simple human errors spiral out of control in complex technological systems. For Perrow, Three Mile Island was a wake-up call. The meltdown wasn’t caused by a massive external shock like an earthquake or a terrorist attack. Instead, it emerged from the interaction of small failures—a plumbing glitch, a maintenance crew’s oversight, a stuck-open valve, and a series of confusing indicators in the control room.
The official investigation blamed the plant’s staff. But Perrow thought that was a cheap shot since the accident could only be understood in retrospect. That was a scary conclusion. Here was one of the worst nuclear accidents in history, but it wasn’t due to obvious human errors or a big external shock. It somehow just emerged from small mishaps that came together in a weird way.
Over the next four years, Perrow trudged through the details of hundreds of accidents. He discovered that a combination of two things cause systems to exhibit the kind of wild, unexpected behaviors that occurred at Three Mile Island.
The first element is complexity. For Perrow, complexity wasn’t a buzzword; it had a specific definition. A complex system is more like an elaborate web than an assembly line; many of its parts are intricately linked and can easily affect one another. Complexity also means that we need to rely on indirect indicators to assess most situations. We can’t go in to take a look at what’s happening in the belly of the beast. In a nuclear power plant, for example, we can’t just send someone to see what’s happening in the core. We need to piece together a full picture from small slivers—pressure indications, water flow measurements, and the like.
The second part of Perrow’s theory has to do with how much slack there is in a system. He borrowed a term from engineering: tight coupling. When a system is tightly coupled, there is little buffer among its parts. The margin for error is thin, and the failure of one part can easily affect the others. Everything happens quickly, and we can’t just turn off the system while we deal with a problem.
In Perrow’s analysis, it’s the combination of complexity and tight coupling that pushes systems into the danger zone. Small errors are inevitable in complex systems, and once things begin to go south, such systems produce baffling symptoms. No matter how hard we try, we struggle to make a diagnosis and might even make things worse by solving the wrong problem. And if the system is also tightly coupled, we can’t stop the falling dominoes. Failures spread quickly and uncontrollably.
When Perrow came up with his framework in the early 1980s, the danger zone he described was sparse: it included exotic systems like nuclear facilities and space missions. But in the intervening years, we’ve steadily added complexity and tight coupling to many mundane systems. These days, computers—often connected to the internet—run everything from cars to cash registers and from pharmacies to prisons. And as we add new features to existing technologies—such as mobile apps to airline reservation systems—we continue to increase complexity. Tight coupling, too, is on the rise, as the drive for lean operations removes slack and leaves little margin for error.
This doesn’t necessarily imply that things are worse than they used to be. What it does suggest, though, is that we are facing a different kind of challenge, one where massive failures come not from external shocks or bad apples, but from combinations of technological glitches and ordinary human mistakes.
We can’t turn back the clock and return to a simpler world. Airlines shouldn’t switch back to paper tickets and traders shouldn’t abandon computers. Instead, we need to figure out how to manage these new systems. Fortunately, an emerging body of research reveals how we can overcome these challenges.
The first step is to recognize that the world has changed. But that’s a surprisingly hard thing to do, even in an era where businesses seem to celebrate new technologies like blockchain and AI. When we interviewed the former CEO of Knight Capital years after the firm’s technological meltdown, he said, “We weren’t a technology company—we were a broker that used technology.” Thinking of technology as a support function, rather than the core of a company, has worked for years. But it doesn’t anymore.
We need to assess our projects or businesses through the lens of complexity and tight coupling. If we are operating in the danger zone, we can try to simplify our systems, increase transparency, or introduce more slack. But even when we can’t change our systems, we can change how we manage them.
Consider a climbing expedition to Mount Everest. There are many hidden risks, from crevasses and falling rocks to avalanches and sudden weather changes. Altitude sickness causes blurred vision, and overexposure to UV rays leads to snow blindness. And when a blizzard hits, nothing is visible at all. The mountain is a complex and tightly coupled system, and there isn’t much we can do about that.
But we can still take steps to make climbing Everest safer. In the past, for example, logistical problems plagued several Everest expeditions: delayed flights, customs issues, problems with supply deliveries, and digestive ailments.
In combination, these small issues caused delays, put stress on team leaders, took time away from planning, and prevented climbers from acclimating themselves to high altitudes. And then, during the final push to the summit, these failures interacted with other problems. Distracted team leaders and exhausted climbers missed obvious warning signs and made mistakes they wouldn’t normally make. And when the weather turns bad on Everest, a worn-out team that’s running behind schedule stands little chance.
Once we realize that the real killer isn’t the mountain but the interaction of many small failures, we can see a solution: rooting out as many logistical problems as possible. And that’s what the best mountaineering companies do. They treat the boring logistical issues as critical safety concerns. They pay a lot of attention to some of the most mundane aspects of an expedition, from hiring logistical staff who take the burden off team leaders to setting up well- equipped base camp facilities. Even cooking is a big deal. As one company’s brochure put it, “Our attention to food and its preparation on Everest and mountains around the world has led to very few gastrointestinal issues for our team members.”
You don’t need to be a mountain climber to appreciate this lesson. After a quality control crisis, for example, managers at pharmaceutical giant Novo Nordisk realized that the firm’s manufacturing had become too complex and unforgiving to manage in traditional ways. In response, they came up with a new approach to finding and addressing small issues that might become big problems.
First, the company created a department of about twenty people who scan for new challenges that managers might ignore or simply not have the time to think about. They talk with non-profits, environmental groups, and government officials about emerging technologies and changing regulations. The goal is to make sure that the company doesn’t ignore small signs of brewing trouble.
Novo Nordisk also uses facilitators to make sure important issues don’t get stuck at the bottom of the hierarchy (as they did before the quality control crisis). The facilitators—around two dozen people recruited from among the company’s most respected managers—work with every unit at least once every few years, evaluating whether there are concerns unit managers may be ignoring. “We go around and find a number of small issues,” a facilitator explained. “We don’t know if they would develop into something bigger if we ignored them. But we don’t run the risk. We follow up on the small stuff.”
Other organizations use a different approach to manage this kind of complexity. NASA’s Jet Propulsion Laboratory (JPL) does some of the most complex engineering work in the world. Its mission statement is “Dare Mighty Things” or, less formally, “If it’s not impossible, we’re not interested.”
Over the years, JPL engineers have had their share of failures. In 1999, for example, they lost two spacecraft destined for Mars—one because of a software problem onboard the Mars Polar Lander and the other because of confusion about whether a calculation used the English or the metric system.
After these failures, JPL managers began to use outsiders to help them manage the risk of missions. They created risk review boards made up of scientists and engineers who worked at JPL, NASA, or contractors—but who weren’t associated with the missions they reviewed and didn’t buy into the same assumptions as mission insiders.
But JPL’s leaders wanted to go even further. Every mission that JPL runs has a project manager responsible for pursuing ground-breaking science while staying within a tight budget and meeting an ambitious schedule. Project managers walk a delicate line. When under pressure, they might be tempted to take shortcuts when designing and testing critical components. So senior leaders created the Engineering Technical Authority (ETA), a cadre of outsiders within JPL. Every project is assigned an ETA engineer, who makes sure that the project manager doesn’t make decisions that put the mission at risk.
If an ETA engineer and a project manager can’t agree, they take their issue to Bharat Chudasama, the manager who runs the ETA program. When an issue lands on his desk, Chudasama tries to broker a technical solution. He can also try to get project managers more money, time, or people. And if he can’t resolve the issue, he brings it to his boss, JPL’s chief engineer. Such channels for skepticism are indispensable in the danger zone because the ability of any one individual to know what’s going on is limited, and the cost of being wrong is just too high.
This approach isn’t rocket science. In fact, the creation of outsiders within an organization has a long history. For centuries, when the Roman Catholic Church was considering whether to declare a person a saint, it was the job of the Promoter of the Faith, popularly known as the Devil’s Advocate, to make a case against the candidate and prevent any rash decisions. The Promoter of the Faith wasn’t involved in the decision-making process until he presented his objections, so he was an outsider free from the biases of those who had made the case for a candidate in the first place.
The sports writer Bill Simmons proposed something similar for sports teams. “I’m becoming more and more convinced that every professional sports team needs to hire a Vice President of Common Sense,” Simmons wrote. “One catch: the VP of CS doesn’t attend meetings, scout prospects, watch any film or listen to any inside information or opinions; he lives the life of a common fan. They just bring him in when they’re ready to make a big decision, lay everything out and wait for his unbiased reaction.”
These solutions might sound obvious, and yet we rarely use them in practice. We don’t realize that many of our decisions contribute to complexity and coupling, resulting in increasingly vulnerable systems. We tend to focus on big, external shocks while ignoring small problems that can combine into surprising meltdowns. And we often marginalize skeptics instead of creating roles for them.
Today, we are in the golden age of meltdowns. More and more of our systems are in the danger zone, but our ability to manage them hasn’t quite caught up. And we can see the results all around us. The good news is that smart organizations are finding ways to navigate this new world, and we can all learn from them.
Excerpted from MELTDOWN by Chris Clearfield and András Tilcsik. Reprinted by arrangement with Penguin Press, a member of Penguin Group (USA) LLC, A Penguin Random House Company. Copyright © Christopher Clearfield and András Tilcsik, 2018.
Evolution doesn't clean up after itself very well.
- An evolutionary biologist got people swapping ideas about our lingering vestigia.
- Basically, this is the stuff that served some evolutionary purpose at some point, but now is kind of, well, extra.
- Here are the six traits that inaugurated the fun.
The plica semilunaris<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NjgwMS9vcmlnaW4ucG5nIiwiZXhwaXJlc19hdCI6MTY3NDg5NTg1NX0.kdBYMvaEzvCiJjcLEPgnjII_KVtT9RMEwJFuXB68D8Q/img.png?width=980" id="59914" width="429" height="350" data-rm-shortcode-id="b11e4be64c5e1f58bf4417d8548bedc7" data-rm-shortcode-name="rebelmouse-image" />
The human eye in alarming detail. Image source: Henry Gray / Wikimedia commons<p>At the inner corner of our eyes, closest to the nasal ridge, is that little pink thing, which is probably what most of us call it, called the caruncula. Next to it is the plica semilunairs, and it's what's left of a third eyelid that used to — ready for this? — blink horizontally. It's supposed to have offered protection for our eyes, and some birds, reptiles, and fish have such a thing.</p>
Palmaris longus<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NjgwNy9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTYzMzQ1NjUwMn0.dVor41tO_NeLkGY9Tx46SwqhSVaA8HZQmQAp532xLxA/img.jpg?width=980" id="879be" width="1920" height="2560" data-rm-shortcode-id="4089a32ea9fbb1a0281db14332583ccd" data-rm-shortcode-name="rebelmouse-image" />
Palmaris longus muscle. Image source: Wikimedia commons<p> We don't have much need these days, at least most of us, to navigate from tree branch to tree branch. Still, about 86 percent of us still have the wrist muscle that used to help us do it. To see if you have it, place the back of you hand on a flat surface and touch your thumb to your pinkie. If you have a muscle that becomes visible in your wrist, that's the palmaris longus. If you don't, consider yourself more evolved (just joking).</p>
Darwin's tubercle<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NjgxMi9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTY0ODUyNjA1MX0.8RuU-OSRf92wQpaPPJtvFreOVvicEwn39_jnbegiUOk/img.jpg?width=980" id="687a0" width="819" height="1072" data-rm-shortcode-id="ff5edf0a698e0681d11efde1d7872958" data-rm-shortcode-name="rebelmouse-image" />
Darwin's tubercle. Image source: Wikimedia commons<p> Yes, maybe the shell of you ear does feel like a dried apricot. Maybe not. But there's a ridge in that swirly structure that's a muscle which allowed us, at one point, to move our ears in the direction of interesting sounds. These days, we just turn our heads, but there it is.</p>
Goosebumps<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NzMxNC9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTYyNzEyNTc2Nn0.aVMa5fsKgiabW5vkr7BOvm2pmNKbLJF_50bwvd4aRo4/img.jpg?width=980" id="d8420" width="1440" height="960" data-rm-shortcode-id="8827e55511c8c3aed8c36d21b6541dbd" data-rm-shortcode-name="rebelmouse-image" />
Goosebumps. Photo credit: Tyler Olson via Shutterstock<p>It's not entirely clear what purpose made goosebumps worth retaining evolutionarily, but there are two circumstances in which they appear: fear and cold. For fear, they may have been a way of making body hair stand up so we'd appear larger to predators, much the way a cat's tail puffs up — numerous creatures exaggerate their size when threatened. In the cold, they may have trapped additional heat for warmth.</p>
Tailbone<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NzMxNi9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTY3MzQwMjc3N30.nBGAfc_O9sgyK_lOUo_MHzP1vK-9kJpohLlj9ax1P8s/img.jpg?width=980" id="9a2f6" width="1440" height="1440" data-rm-shortcode-id="4fe28368d2ed6a91a4c928d4254cc02a" data-rm-shortcode-name="rebelmouse-image" />
Image source: Decade3d-anatomy online via Shutterstock<p>Way back, we had tails that probably helped us balance upright, and was useful moving through trees. We still have the stump of one when we're embryos, from 4–6 weeks, and then the body mostly dissolves it during Weeks 6–8. What's left is the coccyx.</p>
The palmar grasp reflex<img class="rm-lazyloadable-image rm-shortcode" type="lazy-image" data-runner-src="https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8xOTA5NzMyMC9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTYzNjY0MDY5NX0.OSwReKLmNZkbAS12-AvRaxgCM7zyukjQUaG4vmhxTtM/img.jpg?width=980" id="8804c" width="1440" height="960" data-rm-shortcode-id="67542ee1c5a85807b0a7e63399e44575" data-rm-shortcode-name="rebelmouse-image" />
Palmar reflex activated! Photo credit: Raul Luna on Flickr<p> You've probably seen how non-human primate babies grab onto their parents' hands to be carried around. We used to do this, too. So still, if you touch your finger to a baby's palm, or if you touch the sole of their foot, the palmar grasp reflex will cause the hand or foot to try and close around your finger.</p>
Other people's suggestions<p>Amir's followers dove right in, offering both cool and questionable additions to her list. </p>
Fangs?<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Lower mouth plate behind your teeth. Some have protruding bone under the skin which is a throw back to large fangs. Almost like an upsidedown Sabre Tooth.</p>— neil crud (@neilcrud66) <a href="https://twitter.com/neilcrud66/status/1085606005000601600?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Hiccups<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Sure: <a href="https://t.co/DjMZB1XidG">https://t.co/DjMZB1XidG</a></p>— Stephen Roughley (@SteBobRoughley) <a href="https://twitter.com/SteBobRoughley/status/1085529239556968448?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Hypnic jerk as you fall asleep<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">What about when you “jump” just as you’re drifting off to sleep, I heard that was a reflex to prevent falling from heights.</p>— Bann face (@thebanns) <a href="https://twitter.com/thebanns/status/1085554171879788545?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> <p> This thing, often called the "alpha jerk" as you drop into alpha sleep, is properly called the hypnic jerk,. It may actually be a carryover from our arboreal days. The <a href="https://www.livescience.com/39225-why-people-twitch-falling-asleep.html" target="_blank" data-vivaldi-spatnav-clickable="1">hypothesis</a> is that you suddenly jerk awake to avoid falling out of your tree.</p>
Nails screeching on a blackboard response?<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Everyone hate the sound of fingernails on a blackboard. It's _speculated_ that this is a vestigial wiring in our head, because the sound is similar to the shrill warning call of a chimp. <a href="https://t.co/ReyZBy6XNN">https://t.co/ReyZBy6XNN</a></p>— Pet Rock (@eclogiter) <a href="https://twitter.com/eclogiter/status/1085587006258888706?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Ear hair<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Ok what is Hair in the ears for? I think cuz as we get older it filters out the BS.</p>— Sarah21 (@mimix3) <a href="https://twitter.com/mimix3/status/1085684393593561088?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Nervous laughter<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">You may be onto something. Tooth-bearing with the jaw clenched is generally recognized as a signal of submission or non-threatening in primates. Involuntary smiling or laughing in tense situations might have signaled that you weren’t a threat.</p>— Jager Tusk (@JagerTusk) <a href="https://twitter.com/JagerTusk/status/1085316201104912384?ref_src=twsrc%5Etfw">January 15, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Um, yipes.<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Sometimes it feels like my big toe should be on the side of my foot, was that ever a thing?</p>— B033? K@($ (@whimbrel17) <a href="https://twitter.com/whimbrel17/status/1085559016011563009?ref_src=twsrc%5Etfw">January 16, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Context is everything.
The COVID-19 pandemic has introduced a number of new behaviours into daily routines, like physical distancing, mask-wearing and hand sanitizing. Meanwhile, many old behaviours such as attending events, eating out and seeing friends have been put on hold.
A new study looks at how images of coffee's origins affect the perception of its premiumness and quality.