An operation has two jobs that pull against each other. The obvious one is to be efficient, to make and deliver more for less. The quieter one is to still be standing after a fire, a port closure, a supplier collapse or a heatwave. A "sustainable and resilient" operation is one tuned for the long game: efficient enough to compete, but with enough flexibility, redundancy and environmental headroom that a bad week doesn't become a fatal one.

The quick version

  • Resilient operations can absorb a disruption and recover quickly, through flexibility (the ability to reconfigure) and redundancy (spare capacity, second sources, buffer stock).
  • Sustainable operations can keep running for the long term without burning out their people, their resource base or their licence to operate, the "triple bottom line" of people, planet and profit.
  • The two are linked: the same hyper-lean choices that maximise short-term efficiency (single sourcing, zero buffer, no spare capacity) are usually what make an operation brittle and unsustainable.
  • The move is not "add buffer everywhere", that's expensive. It's to map where you're exposed, then buy flexibility cheaply (standardisation, second sources, modular design) and put redundancy only where a failure would be catastrophic.

The idea in depth

Start with the uncomfortable arithmetic, because it's what turns resilience from a virtue into a budget line. The McKinsey Global Institute, in its 2020 report "Risk, resilience, and rebalancing in global value chains", found that companies can now expect supply-chain disruptions lasting a month or longer to occur, on average, every 3.7 years, and that, across a decade, the report estimates disruptions can cost the average company on the order of 40 to 45% of a year's profits. Sit with that for a second. A month-long shock is not a freak event you plan around once. It is a recurring feature of the operating environment, frequent enough to model and budget for.

Which changes how you think about the spend. Resilience stops being insurance you hope never to claim and becomes a designed-in property with a known expected payoff. If a serious disruption is roughly a once-every-four-years event, the question isn't whether to pay for flexibility, it's where, which nodes, if they failed, would actually hurt. That reframing is the whole game. You don't harden everything; you harden what's load-bearing.

Flexibility beats buffer stock: the resilience that pays for itself

The instinct under "be more resilient" is to hold more, more inventory, more suppliers, more spare capacity. That's redundancy, and it works, but it's the expensive kind of resilience: it sits idle most of the time. Yossi Sheffi of MIT, in The Resilient Enterprise (MIT Press, 2005), makes the more useful argument: the cheaper and more durable route to resilience is flexibility, building an operation that can reconfigure itself when something breaks. Standardised parts and processes, modular product design, interchangeable plants, cross-trained people and collaborative supplier relationships all create resilience that also pays off in normal times, because flexibility makes you faster and more adaptable every day, not just on the bad days.

Redundancy sits in a warehouse waiting for a disaster. Flexibility earns its keep every ordinary day, and saves you on the bad ones.

In practice that means asking, for each critical input: "If this disappeared on Monday, what's my plan B, and how fast can I switch to it?" If the honest answer is "we'd be stuck," you have a single point of failure. The fix is rarely just "stockpile", it's qualifying a second source, designing the product so a substitute part fits, or standardising a component so it isn't bespoke to one supplier. These choices buy the same protection as a giant inventory buffer, at a fraction of the carrying cost, while making the whole operation more nimble.

flowchart TD
  A(["A disruption hits
a critical input"]) --> B{"Do we have
a plan B?"} B -->|"No, single point
of failure"| C(["Operation stalls,
recovery is slow + costly"]) B -->|"Yes, flexibility built in"| D(["Reconfigure: second source,
substitute part, spare line"]) D --> E(["Absorb the shock,
recover quickly"]) C -.->|"learn the hard way"| F(["Add flexibility
where it was missing"])
Resilience is mostly about whether a plan B exists before the shock, flexibility built in beats buffer bought after. Leaders Loop

An honest limitation. Flexibility is not free and not always sufficient. Qualifying a second supplier costs money and management attention; modular design can add unit cost; and some inputs are genuinely irreplaceable in the short run (a single rare-earth refinery, a sole-source semiconductor). For those, redundancy, real buffer stock, a pre-negotiated backup, is the only honest answer, and Sheffi says as much: flexibility is the default, redundancy the targeted exception for the nodes you cannot afford to lose. The error to avoid is the lazy extreme in either direction, buffering everything (slow and costly) or buffering nothing (one fire from disaster).

Sustainable operations: the triple bottom line, and what it actually meant

"Sustainable" gets used loosely, so it's worth recovering the original idea. The term that organised modern thinking is the triple bottom line, coined by John Elkington in 1994 and expanded in Cannibals with Forks (1997): an operation should be measured not just on profit, but on three bottom lines, economic, social and environmental, the shorthand "people, planet, profit." The point was never decoration. A business that exhausts its workforce, poisons its surroundings or torches its reputation is not a going concern, however good this quarter's margin looks.

Then comes the twist. In 2018 Elkington himself wrote a piece in Harvard Business Review issuing what he called a "management concept recall" on the triple bottom line, not because the idea was wrong, but because it had been hollowed out into an accounting exercise. People were tallying three columns and declaring victory, rather than changing how the operation actually ran. That's the trap to avoid: sustainability reported is not sustainability achieved.

The useful step is to translate "sustainable" into operating constraints you actually manage, not a report you publish once a year. Treat people-burnout, environmental load and resource dependence as real capacity limits, the same way you treat a machine's duty cycle. An operation run permanently at 100% utilisation, with exhausted staff and no slack, isn't efficient; it's pre-failed. The link back to resilience is direct: slack, redundancy and humane pacing are simultaneously what makes an operation able to absorb a shock and what makes it able to keep running for years.

flowchart LR
  A(["Sustainable & resilient
operations"]) --> B(["People
capacity, not burnout"]) A --> C(["Planet
environmental headroom"]) A --> D(["Profit
efficient enough to compete"]) B --> E(["Slack & flexibility =
shock absorption + longevity"]) C --> E D --> E
The triple bottom line and resilience meet in the same place: an operation with enough slack to keep running. Leaders Loop

A worked example

The clean real-world illustration is the Aisin fire. On 1 February 1997 a blaze destroyed the Aisin Seiki plant in Kariya, Japan, that made roughly 99% of the brake-proportioning valves (P-valves) for Toyota's cars, a sole-source part, held under just-in-time discipline with only hours of stock in the pipeline. By the lean playbook, Toyota should have been crippled for weeks. Instead, as documented in MIT Sloan Management Review's case study, more than 200 firms in Toyota's supplier network improvised temporary P-valve production using shared blueprints and borrowed tooling, and the group restored output within days. The buffer that saved Toyota wasn't inventory, it was the flexibility and trust of a collaborative network. That is Sheffi's thesis happening in real life.

Now a small, simplified version to show the decision a normal operations leader faces. (Illustrative figures throughout; a teaching example, not a real company.) Say you run a kitchen-appliance assembler. One control board comes from a single supplier in one region, 100% single-sourced, in every product you ship. A regional shutdown of one month would idle the whole line; on illustrative numbers, say £500k in lost contribution. Qualifying a second supplier and redesigning the board mount so either part fits costs, say, an illustrative £80k up front plus a small unit-cost premium.

flowchart TD
  A(["Single-sourced control board
100% of shipments (illustrative)"]) --> B{"What does a 1-month
shutdown cost?"} B -->|"~£500k lost contribution"| C(["Expected annual exposure
~£500k ÷ 3.7 ≈ £135k/yr"]) C --> D{"Spend ~£80k once to add
a second source + flexible mount?"} D -->|"Yes"| E(["Switch in days, not weeks,
flexibility pays for itself"]) D -->|"No"| F(["Carry the brittle node
until it fails"])
The flexibility spend is small against the expected annual cost of the exposure, that's the calculation resilience turns on. Leaders Loop · illustrative figures

Spread the £500k shock across a 3.7-year recurrence and the exposure is roughly £135k a year for that one node. Against that, an £80k one-off to make the input switchable is not insurance you hope to waste, it's a positive-return investment that also makes you nimbler in normal times. Note the order of thinking: you didn't harden the whole plant. You found the single load-bearing node, priced its failure honestly, and bought flexibility precisely there. That is resilience done as economics, not anxiety.

Frequently asked questions

Isn't all this just the opposite of being lean and efficient?

No, and that framing is the expensive mistake. Lean removes waste; resilience removes single points of failure. You can be both: standardise and streamline ruthlessly, while making sure the few inputs that could stop you have a plan B. The brittle operations are the ones that confused "no slack anywhere" with efficiency. A small amount of well-placed slack is what lets a lean operation stay lean without betting the company on nothing going wrong.

What's the difference between flexibility and redundancy?

Redundancy is holding spare, extra inventory, a second plant, a backup supplier on standby, protection that mostly sits idle. Flexibility is the ability to reconfigure what you already have: standardised parts, cross-trained staff, modular designs, plants that can make each other's products. Flexibility is usually cheaper and earns its keep daily; redundancy is the targeted exception for inputs you genuinely cannot afford to lose for even a short time.

How do I find my single points of failure without boiling the ocean?

Work backwards from catastrophe, not forwards from a full risk register. List the handful of inputs, suppliers, sites or people whose sudden loss would stop you shipping, then for each, ask how fast you could switch to an alternative. Anything where the honest answer is "we couldn't, quickly" is a single point of failure worth money. Most operations have fewer than they fear, which is exactly why targeting them is affordable.

Is sustainability a separate agenda from resilience, or the same thing?

They overlap more than the org chart suggests. Both depend on slack, humane pacing for people, environmental headroom for the planet, spare capacity for shocks. An operation run at permanent 100% utilisation with exhausted staff is fragile and unsustainable for the same underlying reason. Treating people-capacity and resource-load as real limits, not infinite ones, buys you longevity and shock-absorption at once.

We're small, do month-long-disruption statistics even apply to us?

The headline frequency comes from large global value chains, so don't over-read the exact number for a small local business. But the logic scales down cleanly: if one supplier, one key person or one piece of kit could stop you for weeks, you have an exposure worth pricing. Smaller operations often have more concentrated risk, not less, fewer suppliers, thinner teams, so the targeted-flexibility move matters just as much.

Related in the Toolkit

Resilient operations are built on top of how you deliver and improve work, the way you run delivery (delivery methodologies) shapes how quickly you can reconfigure under pressure, and continuous-improvement disciplines (Lean, Six Sigma & Kaizen) are where you remove waste without removing the slack that keeps you standing.

Where to go next