Product analytics & success metrics: pick the number that bites

A product team can ship every week, watch sign-ups climb, fill a dashboard with green arrows, and still be building something almost nobody comes back to. The numbers look like success because most product metrics only ever go up. The discipline of product analytics is learning to pick the few measures that can actually turn against you, and then having the nerve to act when they do.

The quick version

Vanity metrics (total sign-ups, page views, cumulative downloads) only ever rise and rarely change a decision. Actionable metrics are rates and comparisons that can drop, and tell you what to do next.
Pick a small set of measures tied to real customer value, not everything you can log. A North Star Metric names the value; a framework like AARRR or HEART breaks it into the few inputs you can actually move.
Any metric you turn into a target gets gamed, that's Goodhart's law. Pair every target with a guardrail so people can't win the number while losing the point.
Analytics describes what happened, not why. A moving number is a reason to investigate, not proof your last change caused it.

The idea in depth

Product analytics isn't a single theory, it's a set of measurement habits, and the gap between teams who are good at it and teams who aren't is almost never the tooling. It's judgement about which numbers deserve attention. The useful frameworks all push in the same direction: measure fewer things, measure value rather than activity, and choose measures that can deliver bad news.

Vanity metrics feel good; actionable metrics change a decision

The cleanest place to start is the distinction Eric Ries drew in The Lean Startup (2011) between vanity metrics and actionable metrics. Vanity metrics are the cumulative totals that only ever go up, registered users, total downloads, all-time page views. They feel like progress and almost never inform a decision, because there's no version of the next board meeting where "total sign-ups" goes down and forces a change of plan. Actionable metrics, by contrast, are usually rates and comparisons, this week's activation rate, this cohort versus last, conversion by channel, and they're built to move in both directions. Alistair Croll and Benjamin Yoskovitz extend the point in Lean Analytics (2013): most teams either track far too many metrics and lose focus, or track the wrong, flattering ones. Their prescription is the One Metric That Matters, at any given stage, the single number you care about most right now.

Here's a test you can run on any dashboard tomorrow. For each metric, ask: "what decision changes if this moves, and which way would it have to move to stop us?" If you can't answer, it's decoration. One caveat worth being honest about, "vanity" is contextual, not absolute. Total users matters enormously to an investor sizing a market and barely at all to a team fixing week-two drop-off. The question isn't whether a metric is good in the abstract; it's whether it's the right one for the decision in front of you.

For every metric on the dashboard, ask: what decision changes if this moves, and which way would it have to move to stop us?

Name the value first, then the inputs you can move

Once you've cut the vanity, the risk flips: a team can over-focus on one number and lose the thread back to customer value. Two well-worn frameworks guard against that. The first is the North Star Metric, a term coined by growth investor Sean Ellis: the single measure that best captures the core value your product delivers to customers, Airbnb's "nights booked," for instance, which only goes up when a guest and a host both got what they wanted. Amplitude's widely-used North Star Playbook (co-authored by John Cutler) frames it as a leading indicator of sustainable revenue, deliberately sitting upstream of the lagging financials, and pairs it with a handful of inputs, the things a team can directly influence that ladder up to it. You don't manage the North Star directly; you move its inputs.

The second is older and bottom-up: Dave McClure's AARRR, or "Pirate Metrics," from a 2007 Ignite Seattle talk, Acquisition, Activation, Retention, Referral, Revenue. It maps the customer's journey into five stages, each with its own metric, so a team can find the one stage that's actually leaking rather than fiddling with all five at once. For experience quality specifically, Google researchers Kerry Rodden, Hilary Hutchinson and Xin Fu published the HEART framework (Happiness, Engagement, Adoption, Retention, Task success) at the CHI 2010 conference, alongside a process called Goals–Signals–Metrics: state the goal in plain words, decide what observable user behaviour would signal it, and only then choose the metric. That order matters, it stops teams measuring what's easy to log instead of what they actually care about.

In practice you work top-down and bottom-up at once: name the value in a sentence (the North Star), then use AARRR or HEART to break it into three or four inputs a squad can own. Worth stating plainly, though, a North Star is a strategic bet, not a found fact, and a badly chosen one quietly steers a whole company wrong for a year. "Time on site," for example, reads as engagement but really rewards a product that's hard to leave. The wrong star for most businesses.

flowchart TD
    A(["State the goal in plain words"]) --> B(["North Star Metric: the value, in one number"])
    B --> C(["Inputs you can actually move"])
    C --> D(["Acquisition & activation"])
    C --> E(["Engagement & retention"])
    C --> F(["Referral & revenue"])
    D --> G(["A squad owns each input"])
    E --> G
    F --> G

Top-down: the North Star names the value; AARRR/HEART break it into a few inputs a team can own. Leaders Loop

Every target you set will be gamed, design for it

The most important caution in measurement has a name: Goodhart's law. Charles Goodhart observed in 1975 that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." The anthropologist Marilyn Strathern gave it its memorable form in 1997: when a measure becomes a target, it ceases to be a good measure. The mechanism is universal, the moment people are rewarded for a number, they optimise the number, not the thing it stood in for. Reward support agents on tickets closed and they close fast, not well. Reward a product team on weekly active users and dark patterns start to look like good ideas. A metric that's safe to watch can turn dangerous the instant it's a target.

The practical rule: never ship a target without a guardrail, a paired counter-metric that catches the cheat. Chasing activation? Watch thirty-day retention beside it, so you can't win by activating people who never come back. Pushing conversion? Watch refund and complaint rates. Guardrails reduce gaming; they don't end it. Goodhart's law is a property of incentives, not a bug you can patch, which is why the leadership job is to keep asking, and mean it, "what would this metric look like if we were quietly cheating it?"

flowchart LR
    A(["Set a single target metric"]) --> B(["People optimise the number"])
    B --> C(["The proxy and the real goal drift apart"])
    C --> D(["Pair it with a guardrail counter-metric"])
    D --> E(["You can no longer win the number while losing the point"])

Goodhart's law in motion, and the guardrail that blunts it. Leaders Loop

A worked example

Picture a mid-sized B2B note-taking app. The leadership deck opens with the proud number: 240,000 registered users, up and to the right every quarter. The team feels good. Their new head of product runs the dashboard through the test above and finds almost every metric is a vanity metric, cumulative totals that can't fall. (Figures here are illustrative, to show the reasoning, not real company data.)

She rebuilds it around one question: what's the value this product delivers? Not "an account exists," but "someone captures notes they come back to." She names a North Star, weekly active note-takers, and breaks it down with AARRR. The leak is immediate and obvious once the numbers are honest:

flowchart LR
    A(["Sign-ups: 10,000 / mo"]) --> B(["Activated: created 3 notes in week 1, 28%"])
    B --> C(["Retained to week 4, 19%"])
    C --> D(["Weekly active note-takers, the North Star"])

One month's funnel (illustrative). Acquisition is healthy; the floor falls out at activation. Leaders Loop

Acquisition was never the problem, ten thousand people arrive a month. The floor falls out at activation: barely a quarter ever reach the moment the product becomes useful. The "240,000 users" headline had been hiding a two-in-three failure to ever deliver value. That's the One Metric That Matters for this stage: activation rate, not the all-time total.

The team sets a target, lift activation from 28% to 40% in a quarter, and applies Goodhart's law on purpose. Activation is easy to game: drop the bar to "created 1 note" and the number jumps overnight while meaning nothing. So they pair the target with a guardrail: week-4 retention of newly activated users must not fall. Now the only way to win is to activate people who genuinely stick. Two months in, activation is up to 36% and retention has held, a real gain. But the head of product resists crediting the onboarding redesign outright, because the company also tightened its ad targeting that month and may simply be acquiring better-fit users. The metric told her where the problem lived and whether things improved; it couldn't tell her why on its own. That needs a clean comparison, holding the new onboarding back from a random slice of arrivals and letting the two groups settle the argument.

Frequently asked questions

What's the difference between a vanity metric and an actionable one?

A vanity metric is usually a cumulative total that only ever rises, total users, all-time downloads, lifetime page views, and rarely changes a decision. An actionable metric is normally a rate or a comparison (activation rate, week-over-week retention, conversion by channel) that can move in both directions and points to a next step. Eric Ries drew the line in The Lean Startup; the quick test is whether you can name the decision the metric would change.

How many metrics should a product team track?

Far fewer than the tool lets you. The lesson shared across Lean Analytics and Google's HEART work is that more dashboards mean less focus. Most teams do well with one North Star naming the value, three or four inputs that ladder up to it, and a guardrail or two. You can monitor dozens of secondary numbers, but you should only be steering by a handful at a time.

Do we need a North Star Metric and AARRR and HEART?

No, they answer different questions and you pick the one that fits. The North Star names the single value you're chasing. AARRR maps the customer journey into five stages so you can find the one that's leaking. HEART is aimed at experience quality, with its Goals–Signals–Metrics process to stop you measuring what's merely easy to log. Many teams use a North Star plus one of the other two; stacking all three is usually over-engineering.

What is Goodhart's law and why should I care?

It's the principle that "when a measure becomes a target, it ceases to be a good measure" (Strathern's 1997 phrasing of Charles Goodhart's 1975 observation). Once people are rewarded for a number, they optimise the number rather than the goal it stood for. It matters because it means no metric is safe to turn into a target on its own, pair each target with a guardrail counter-metric that catches the obvious ways to cheat it.

My key metric went up after we shipped a feature, did the feature work?

Not necessarily. Analytics is descriptive: it tells you what happened, not what caused it. A metric can rise because of your change, or because you started acquiring better-fit users, or because of seasonality. Treat a moving number as a reason to investigate, then run a controlled comparison, hold the change back from a random group, before you credit it. (See discovery, validation & de-risking for how to set those tests up.)

Related in the Toolkit

Product strategy & vision, your North Star Metric is only as good as the strategy behind it; the value you measure has to be the value you've chosen to create.
Product lifecycle (launch / grow / mature / exit), the "one metric that matters" changes by stage; activation matters early, retention and revenue later.
Roadmapping & prioritisation (RICE, MoSCoW, cost of delay), analytics tells you where the leak is; prioritisation decides which leak you fix first.
Discovery, validation & de-risking, the controlled comparison that turns a moving metric from correlation into evidence of cause.
MVP & iterative delivery, actionable metrics are how you know whether each iteration actually made the product better.
Customer needs identification & latent needs, to lift a North Star you have to know which need your activated users are really satisfying.
Usability & guerrilla testing, the qualitative "why" behind a quantitative drop-off; numbers find the cliff, watching users find the cause.
Sales process & pipeline management, AARRR's funnel logic is the same discipline applied to a sales pipeline, with the same guardrail risks.

Where to go next

Dave McClure, "Startup Metrics for Pirates: AARRR!" (YouTube), the original, fast-talking source of the five-stage funnel, straight from the person who coined it.
Amplitude, "The North Star Playbook" (PDF), the clearest practical guide to naming a North Star and the inputs that move it, co-authored by John Cutler.
Rodden, Hutchinson & Fu, "Measuring the User Experience on a Large Scale" (Google, CHI 2010), the primary paper behind HEART and the Goals–Signals–Metrics process.
Croll & Yoskovitz, Lean Analytics (O'Reilly, 2013), the book-length case for the One Metric That Matters and against vanity metrics, with benchmarks by business model.
"Goodhart's law" (Wikipedia), a well-sourced overview of the law's origins (Goodhart 1975, Strathern 1997) and the many ways targets get gamed.