Cyber risk & incident response: deciding well before the bad day

The worst time to decide who can take a customer-facing system offline is at 2am, with an attacker inside it and the legal team still asleep. Cyber risk is the chance that a digital failure, an intrusion, a leak, an outage, costs you money, trust, or both; incident response is the discipline of handling that failure well when it comes. And it will come. The useful question is not whether you can prevent every incident, but whether you have already decided how you will behave during one.

The quick version

Cyber risk is just risk with a digital cause: likelihood times impact, measured in dollars and reputation, not in scary acronyms. It belongs on the same risk register as everything else you manage.
Incident response is the rehearsed plan for the bad day, detect, contain, eradicate, recover, learn, so the team acts on a decision made calmly in advance, not on adrenaline.
The metric leaders feel is time: how long an attacker sits undetected, and how long it takes you to contain them. Shorter is cheaper. The 2024 global breach lifecycle averaged 258 days from breach to containment.
The trap is treating security as a technology purchase. Tools help, but the difference on the day is governance: clear ownership, decision rights, and a plan the business has actually practised.

The idea in depth: a breach is a business decision

The most useful reframe in this whole field is that cyber risk is not a separate species of risk. It is the ordinary kind, likelihood multiplied by impact, that happens to travel through software. That framing matters because it puts the topic where it belongs: on the board's risk register, owned by the business, not quarantined in a server room. The current standard makes this explicit. When the US National Institute of Standards and Technology released the Cybersecurity Framework 2.0 in February 2024, the single biggest change from the previous version was adding a sixth core Function, Govern, alongside Identify, Protect, Detect, Respond and Recover. Governance was promoted to first among equals: the framework now says, in effect, that strategy, ownership and accountability sit above the technical controls, not beside them.

So the move is to stop asking your security lead "are we secure?", a question with no honest answer, and start asking "which risks have we decided to accept, reduce, or transfer, and who owns that call?" Put cyber on the same register as supply-chain and financial risk. Name an executive accountable for it. Decide in advance, in daylight, which systems you would take offline to stop a spread and who is allowed to pull that switch. That single pre-made decision is worth more than most of the tooling budget.

Stop asking "are we secure?", it has no honest answer. Ask "which risks have we decided to accept, and who owns that call?"

The lifecycle every team should rehearse

Once an incident starts, you do not want creativity, you want a script everyone already knows. Two well-worn maps describe that script. The long-standing practitioner version is the SANS six-step model, often remembered by the initials PICERL: Preparation, Identification, Containment, Eradication, Recovery, and Lessons Learned. NIST's own guidance, set out for years in Special Publication 800-61, drew the same arc with fewer boxes, Preparation; Detection and Analysis; Containment, Eradication and Recovery; and Post-Incident Activity, and crucially drew it as a loop, where the lessons from one incident feed straight back into preparation for the next.

flowchart LR
  A(["Preparation
plan, roles, tools, drills"]) --> B(["Detection
& analysis"])
  B --> C(["Containment
stop the spread"])
  C --> D(["Eradication
remove the threat"])
  D --> E(["Recovery
restore + watch"])
  E --> F(["Lessons learned
what do we change?"])
  F -.-> A

The incident-response loop, the post-incident review is not the end, it is next time's preparation. Leaders Loop

In April 2025 NIST rewrote that guidance as SP 800-61 Revision 3 and made a pointed change: it dropped the rigid lifecycle diagram and re-cast incident response against the six CSF 2.0 Functions instead. The message to leaders is direct, response is not a self-contained IT runbook that begins when the alarm sounds. It is woven through governance, asset knowledge and protective controls long before, and through recovery and review long after. Which points to one concrete move: schedule a tabletop exercise. Gather the people who would actually be in the room, security, legal, communications, an executive decision-maker, and walk through a realistic scenario out loud. You are not testing the firewall. You are testing whether the humans know who decides what, and discovering the gaps while they are still cheap to fix.

An honest limitation. These lifecycles are clean diagrams describing a messy reality. Real incidents do not advance one tidy box at a time; you are often containing one foothold while still detecting another, and "recovery" can begin before you are sure eradication is complete. Treat the model as a shared vocabulary and a checklist of stages you must not skip, not as a literal sequence you march through. Its real value is social: it gives a panicking team a common language and a sense of where they are.

The metric leaders actually feel: time

Security has a hundred metrics, but the one that shows up in the financial impact is time, specifically dwell time (how long an attacker operates undetected) and containment time (how long from detection to control). The clearest evidence is IBM's annual Cost of a Data Breach study, based on an in-depth analysis of real breaches at 604 organisations worldwide. Its 2024 report put the global average cost of a breach at USD 4.88 million, up 10% on the prior year, the largest jump since the pandemic, and the average breach lifecycle, from first compromise to containment, at 258 days. That is roughly eight and a half months in which the cost meter is running.

The direction of travel is the actionable part. The same report found that organisations making extensive use of security AI and automation contained breaches markedly faster and saved, on average, close to USD 1.9 million per breach compared with those that did not. Read past the AI headline to the underlying point: speed of detection and containment is the lever. Anything that shortens the gap between "something is wrong" and "it is contained" pays for itself. Two practical moves follow: instrument, and pre-authorise. Make sure someone is actually watching the alerts (a budget line many firms quietly skip), and pre-agree the containment actions a responder may take at 2am without hunting for an executive, isolating a host, disabling an account, blocking a domain, so the clock stops sooner.

One caution on the numbers, in the spirit of using them honestly: a global average cost is a blunt instrument. It blends a hospital chain and a corner shop, and your own exposure depends on what data you hold and what you would owe if it leaked. Use the figure to size the conversation, not to set your budget to the decimal. The defensible claim is the relationship, triangulated across years of the same study and echoed in NIST's emphasis on Detect and Respond: shorter dwell time means lower cost.

A worked example

Picture a mid-sized online retailer, call it Harbourgate, running a Friday-evening checkout when a finance clerk reports that an invoice email looks "off." (Illustrative scenario and figures throughout; this is a teaching example, not a real company.) An attacker has phished a credential and is moving quietly through the email system, three weeks in already. Harbourgate has bought good tools. What it has never done is decide who is in charge during an incident.

flowchart TD
  A(["Friday 6pm: 'this email looks off'"]) --> B{"Is there a named
incident lead?"}
  B -->|"No, Harbourgate before"| C(["Hours lost arguing who decides;
attacker keeps moving"])
  B -->|"Yes, Harbourgate after"| D(["Lead invokes the plan,
contains in pre-agreed steps"])
  C --> E(["Longer dwell time =
higher cost, harder cleanup"])
  D --> F(["Reset creds, isolate mailbox,
notify, then review"])
  F --> G(["Lessons learned feed
next preparation"])

The same intrusion, two outcomes, the difference is a decision made before the incident, not during it. Leaders Loop

In the version where Harbourgate had run one tabletop exercise, the clerk's report goes to a named incident lead, who invokes a one-page plan. Containment is already authorised: force a password reset on the affected account, isolate the mailbox, and block the attacker's known sending domain, no waiting for a director to wake up. Communications has a holding statement drafted. Legal knows, before anyone asks, that the jurisdiction's breach-notification clock may now be running and how long they have. The intrusion is contained over the weekend, dwell time measured in days rather than the eight-month average, and the Monday review produces three concrete fixes: multi-factor authentication on email, alerting on unusual mailbox rules, and a second person trained to lead. None of that required new software. It required decisions made in advance, which is the entire point.

Frequently asked questions

What is the difference between cyber risk and incident response?

Cyber risk is the forward-looking question, what could go wrong digitally, how likely is it, and what would it cost? Incident response is the operational answer for when something does go wrong, the plan, roles and steps for handling a live event. One is about deciding how much risk to carry; the other is about behaving well when a risk lands. You need both, and the same governance function should own them.

Isn't this just IT's job?

No, and treating it that way is the common failure. The technical containment is IT's job; the decisions around it are the business's. Whether to take the store offline, what to tell customers, when the legal notification clock starts, whether to pay a ransom, these are business and legal calls. NIST's CSF 2.0 made this explicit by adding Govern as a core Function, signalling that accountability sits with leadership, not only the security team.

We have no budget for security tools. What can we actually do?

Most of the highest-value moves cost little. Turn on multi-factor authentication everywhere it is offered. Write a one-page incident plan that names who leads and who decides. Run a single tabletop exercise around a kitchen table. Keep tested, offline backups. Decide your breach-notification obligations before you need them. None of these is a purchase; all of them shorten the bad day.

How often should we test the plan?

At least once a year, and again whenever the business changes materially, a new product, a major supplier, a reorganisation that moves who would be in the room. A plan that has never been rehearsed is a document, not a capability. The point of testing is less to validate the plan and more to find the gap between what the plan says and what people actually know.

Should we just buy cyber insurance and move on?

Insurance is a legitimate way to transfer some financial risk, but it is not a response plan and it will not restore your customers' trust. Most insurers now require baseline controls, MFA, backups, a documented plan, before they will pay out, so the work of preparing is unavoidable even if you do insure. Treat insurance as one line in the risk decision, alongside reducing and accepting risk, not as a substitute for being ready.

Related in the Toolkit

Good incident response rests on knowing what you are defending and who can reach it, which is why security fundamentals & threat modelling and identity & access management are the two foundations this topic leans on most, and why the post-incident review borrows directly from the continuous-improvement discipline of Kaizen.

Security fundamentals & threat modelling, knowing how you might be attacked is what makes a response plan realistic rather than generic.
Identity & access management, most breaches travel through credentials, so controlling who can reach what is front-line containment.
Data privacy & PII handling (GDPR and equivalents), what data you hold sets your breach-notification duties and your real exposure.
Data retention, residency & sovereignty, where data lives shapes both the impact of a breach and the laws that apply to it.
Product & data risk, the wider risk register that cyber risk should sit inside, not apart from.
Financial statements (P&L, balance sheet, cash flow), to weigh a risk you have to translate it into the money it could cost.
Lean, Six Sigma, Kaizen & continuous improvement, the lessons-learned step is a structured retrospective by another name.
Hosting & cloud architecture, how and where you run systems determines your attack surface and your recovery options.

Where to go next

The NIST Cybersecurity Framework (CSF) 2.0, the canonical, free, vendor-neutral framework; read the Govern and Respond Functions even if you read nothing else.
NIST SP 800-61 Revision 3, Incident Response Recommendations, the 2025 rewrite that folds incident response into business risk management; the modern reference for how to structure a plan.
IBM, Cost of a Data Breach Report 2024 (announcement), the most-cited annual data on what breaches cost and why speed of containment matters; useful for making the business case.
SANS Threat Analysis Rundown with Katie Nickels (SANS Institute, YouTube), a practitioner-led walkthrough of current threats and how response teams reason about them; a clear window into how the work actually feels.