Metric design: leading vs lagging, vanity vs actionable, and the guardrails in between

The moment you put a number on a wall and tie it to praise, bonuses or your own attention, that number stops describing the work and starts directing it. People will move toward it, sometimes by doing the real job better, sometimes by quietly doing whatever makes the number go up. Metric design is the discipline of making sure those two paths are the same path. Get it wrong and the problem isn't that nothing changes, it's that the wrong thing changes, delivered with impressive efficiency.

The quick version

Lagging metrics tell you the result you got (revenue, churn, an injury rate). Leading metrics track the few behaviours that cause that result and that your team can directly move this week. You need both: the lag to know if you won, the lead to do something about it.
Vanity metrics only ever go up and never tell you what to do next (total registered users). Actionable metrics show cause and effect, so a change in the number points to a decision.
Any single metric, pushed hard, gets gamed, that's Goodhart's Law. The fix is guardrail metrics: the things that must not get worse while you chase the headline number.
The move: for every target, name one lead measure you'll act on and one guardrail you'll defend. A number with no guardrail is a number you've decided to let someone game.

The idea in depth: a measure that becomes a target stops being a good measure

Start with the failure mode, because it governs everything else. In 1975 the economist Charles Goodhart, writing about UK monetary policy, observed that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." Two decades later the anthropologist Marilyn Strathern, studying audit culture in British universities, gave it the line everyone now quotes: "When a measure becomes a target, it ceases to be a good measure" (Strathern, European Review, 1997). The mechanism is simple. A metric is usually a proxy, call volume stands in for "we helped customers," lines of code stand in for "we built something." The instant the proxy carries the reward, people optimise the proxy and the thing it was standing in for drifts away.

The practical move is to treat gaming as the default, not a character flaw. When you set a target, ask out loud: "If someone wanted to hit this number without doing the underlying job, how would they?" Then design against that answer before you ship the target, not after the quarter is ruined.

The second idea is timing, and the canonical treatment is McChesney, Covey and Huling's The 4 Disciplines of Execution (2012). A lag measure is the outcome you want, but by the time you can read it, the work that produced it is done; you are looking at history. A lead measure is different in two ways: it is predictive (move it and the lag follows) and it is influenceable (your team can change it this week, without waiting on anyone else). Their example: you can't control whether your car breaks down on the motorway, that's the lag, but you can control how often it gets serviced, which is the lead.

flowchart LR
  L1("Lead measure
coaching 1:1s held this week") --> L2("Lead measure
deals with a next step booked")
  L2 --> O("Lag measure
quarterly revenue")
  L1 -. "you can move these now" .- L2
  O --> R(["read only after the quarter closes"])

Lead measures are the few behaviours you can change this week that predict the outcome you can only read later. Leaders Loop

So stop running your team off the lag alone. Revenue, churn and NPS are scoreboards, and you cannot coach a scoreboard. Pick the one or two upstream behaviours your evidence says drive the outcome, which is partly a regression question, make those the weekly conversation, and let the lag confirm the bet. The distinction matters more than it looks: a lag measure feels more "real" because it is money, but it is the least useful thing to manage by, precisely because it has already happened.

The idea in depth: vanity numbers feel like progress; actionable numbers cause decisions

The third lens comes from Eric Ries in The Lean Startup (2011). A vanity metric makes you feel good but never tells you what to do, total registered users, cumulative page views, "all-time downloads." Its tell is that it can essentially only go up, so it always flatters the people reporting it and never forces a decision. An actionable metric, by contrast, demonstrates clear cause and effect: when it moves, you can point to what you did. Ries's practical filter is the three A's, a good metric is actionable (ties cause to effect), accessible (the people who need it can read it), and auditable (you can trust the underlying data).

So convert your cumulative, always-rising numbers into rates and cohorts. "Total sign-ups: 50,000" is vanity. "Of the people who signed up last week, what share completed setup, and is that share rising or falling?" is actionable, it isolates the effect of the thing you just changed. The same logic links straight to correlation vs causation: a metric that only correlates with success but doesn't sit on the causal path will send you confidently in the wrong direction.

"The only metrics that entrepreneurs should invest energy in collecting are those that help them make decisions.", Eric Ries, The Lean Startup

Now put the three lenses together with a guardrail. The cleanest practitioner answer to Goodhart's Law comes from online experimentation. Ronny Kohavi and colleagues, in Trustworthy Online Controlled Experiments (2020), formalise two ideas worth stealing. The first is the Overall Evaluation Criterion (OEC): rather than worship a single number, agree on a small composite that reflects what you actually want long-term, including signals that predict future value, not just this week's clicks. The second is guardrail metrics (or counter-metrics): business-critical measures that don't have to improve, but are not allowed to get worse. Ship a change that lifts conversion but quietly tanks page-load time, and the guardrail trips.

flowchart TD
  G("Headline target
support tickets closed per agent") --> W{"could someone hit this
without doing the real job?"}
  W -->|"yes, close tickets fast,
reopen rate climbs"| GR("Guardrail
reopened-ticket rate must not rise")
  W -->|"yes, push CSAT surveys
only to happy customers"| GR2("Guardrail
survey response rate held steady")
  GR --> S(["a target you can trust"])
  GR2 --> S

Designing a guardrail: name the cheat for any headline target, then measure the thing the cheat would damage. Leaders Loop

The rule that falls out of this: never ship a target naked. For each headline metric, pair it with at least one guardrail that catches the obvious cheat. Chasing handle time on a support line? Guardrail it with reopened-ticket rate and customer satisfaction. Chasing sales velocity? Guardrail it with discount depth and 90-day retention. The guardrail doesn't need its own bonus; it just needs to be watched and to have a tripwire that pauses the celebration.

An honest limitation. Guardrails and composites reduce gaming, they don't end it, every measure is still a proxy, and a determined optimiser can find the seam you didn't guard. There is also a cost: too many guardrails and an OEC stuffed with a dozen weighted terms becomes something nobody can act on, which recreates the vanity problem at the level of the whole dashboard. Kaplan and Norton made this point with the Balanced Scorecard (Harvard Business Review, 1992), whose opening line is simply "What you measure is what you get", their answer was balance across a few perspectives, not an exhaustive list. The discipline is restraint: a few real measures you'll genuinely act on beat a wall of numbers you'll only admire.

A worked example

A head of customer support inherits a backlog and sets one target: tickets closed per agent per day. It is a lag measure, it is a single number, and it has no guardrail, three design faults at once. (The figures below are illustrative.)

Within a month, closures jump from an illustrative 18 to 31 a day. The scoreboard looks like a triumph. But the proxy has detached from the job. Agents have learned that the fastest close is a fast close, so they're firing off "Does that solve it? I'll close this for now" replies and shutting tickets that aren't resolved. The reopened-ticket rate, unmeasured, has roughly doubled, and customer satisfaction is sliding. Goodhart, exactly on schedule.

Redesign it with the three lenses. Keep the lag measure for the scoreboard, but stop managing by it. Add a lead measure the team can move this week and that predicts genuine resolution, the share of tickets where the agent's first reply fully answers the customer's question (first-contact resolution). Add a guardrail: reopened-ticket rate may not rise above its baseline, and CSAT response rate must stay steady so nobody games the survey by only sending it to happy customers. Now the weekly conversation is about first-contact resolution (a behaviour you can coach), the lag confirms whether it's working, and the guardrail catches the cheat the old target invited. Same people, same backlog, a metric that points at the real job instead of away from it.

Frequently asked questions

How many metrics should a team actually have?

Few. A common, defensible shape is one lag measure (the outcome), one or two lead measures (the behaviours you'll act on weekly), and one or two guardrails. Past a handful, attention spreads so thin that no single number drives a decision, which is the vanity-metric problem wearing a dashboard. Balance, not volume, is the goal Kaplan and Norton were arguing for.

Isn't every metric eventually gamed, so why bother?

Goodhart's Law says pressure degrades a measure; it doesn't say measurement is pointless. The point of guardrails and a small composite (an OEC) is to make the honest path and the gaming path the same path, so the cheapest way to move your number is to do the real work. You won't reach zero gaming. You can make it not worth the effort.

What's the fastest way to spot a vanity metric in my own reports?

Ask two questions. First: "Can this number realistically go down?" If it's a cumulative total (all-time users, total downloads), it basically can't, so it can't fail, that's the tell. Second: "If it moved tomorrow, what decision would I make?" If the answer is "feel pleased," it's vanity. Convert it to a rate or a cohort and the decision usually appears.

Leading vs lagging, which do I report to my board?

Report the lag (boards govern outcomes), but run the team on the lead. The board cares whether you hit revenue; your team can't do anything to "revenue" directly today. Show the board the lag with the one or two lead measures you're betting will move it, that's the difference between a scoreboard and a plan.

Do guardrail metrics need their own targets and bonuses?

No, and attaching incentives to them usually backfires, because then the guardrail itself becomes a target that gets gamed. A guardrail needs a baseline, a tripwire, and someone whose job is to watch it. It's a brake, not a second accelerator.

Related in the Toolkit

Data types (discrete/continuous, categorical/ordinal), the kind of data a metric is built on decides which averages and comparisons are even valid.
Descriptive statistics (mean, median, mode, variance, SD), a metric reported as a bare average can hide the spread that's actually killing you.
Distributions, percentiles & quartiles, why a p90 latency guardrail beats an average for catching the worst experiences.
Correlation vs causation, an actionable metric has to sit on the causal path, not merely move alongside success.
Regression (linear, non-linear, logistic), how you test whether a candidate lead measure actually predicts the lag.
First principles vs heuristics vs analogical reasoning, derive the metric from the outcome you want, rather than copying a competitor's KPI.
Reversible vs irreversible decisions, how hard to fight over a metric depends on how easily you can change it later.
Jobs-to-be-Done & needs research, the customer's real job is the thing your proxy metric is always at risk of drifting away from.

Where to go next

Eric Ries, The Lean Startup (2011), the source of the vanity-vs-actionable distinction and the three A's; chapter on "Measure" is the core.
Kohavi, Tang & Xu, Trustworthy Online Controlled Experiments (2020), the practitioner bible for OECs and guardrail metrics; the companion site links the book and primers.
Kaplan & Norton, "The Balanced Scorecard" (HBR, 1992), the case for balance over a single financial number; opens with "What you measure is what you get."
Goodhart's Law (overview, with primary citations), a concise, well-sourced entry to Goodhart's 1975 original and Strathern's 1997 reformulation.
Eric Ries on vanity metrics (talk, YouTube), a short, plain-spoken version of the actionable-metrics argument straight from the author.