Somewhere in your business, a customer is being asked "How likely are you to recommend us, on a scale of 0 to 10?", and the answer is about to become a quarterly target. Before it does, it's worth knowing what that number can and can't tell you. NPS, CSAT and CES are the three most common customer metrics, and the most common mistake is treating them as interchangeable scores of "are customers happy?" They're not. They measure different things, on different scales, for different decisions.

The quick version

  • CSAT (Customer Satisfaction) asks "how satisfied were you?" right after an interaction. It's the thermometer for a specific touchpoint, fast, intuitive, but short-lived.
  • NPS (Net Promoter Score) asks "how likely are you to recommend us?" on a 0–10 scale. It's a relationship-level loyalty proxy, reported as % promoters minus % detractors.
  • CES (Customer Effort Score) asks how easy it was to get something done. It predicts disloyalty better than delight does, high effort is what drives people away.
  • None is "the one number." Use CSAT for moments, CES for service journeys, NPS for the overall relationship, and treat any of them as a trend to investigate, not a target to hit.

The idea in depth

These aren't leadership theories so much as measurement instruments that leadership decisions lean on. Each came from a specific argument about what predicts whether customers stay, spend more and tell their friends. Knowing the argument behind each one is the difference between reporting a number and actually using it.

CSAT, the thermometer for a moment

CSAT is the oldest and simplest of the three. After a purchase, a support chat or a delivery, you ask "How satisfied were you with [the thing]?" on a scale, often 1 to 5, and report the percentage who answered in the top one or two boxes. It's transactional by design: it captures how a single, recent interaction felt while it's fresh.

Its strength is also its limit. CSAT is high-resolution about a moment and nearly blind about the relationship. A customer can rate a support call 5/5 and still cancel next month because the product no longer fits. Use it where it earns its keep: wired to specific touchpoints, post-onboarding, post-support, post-delivery, so you can see which moment is dropping. What it can't do is stand in as a verdict on loyalty. A single rolling CSAT average tells you the temperature of a touchpoint, not the health of the bond.

NPS, the relationship loyalty proxy (and its loudest critics)

Net Promoter Score was introduced by Bain & Company's Fred Reichheld in his 2003 Harvard Business Review article "The One Number You Need to Grow." The pitch was bold: after testing a battery of survey questions against actual customer behaviour and company growth, Reichheld argued that one question, "How likely is it that you would recommend us to a friend or colleague?", predicted growth as well or better than anything else. Answers run 0–10: 9–10 are promoters, 7–8 passives, 0–6 detractors. The score is the percentage of promoters minus the percentage of detractors, so it ranges from −100 to +100.

The appeal is obvious: one comparable number, intuitive logic (loyal customers refer others), board-friendly. Used well, NPS is a relationship-level signal, asked of the whole customer base periodically, and the part that actually earns its keep is the follow-up "why?" comment. The number tells you the trend; the verbatims tell you what to fix.

Here's the honest part. NPS's headline claim, that it is uniquely superior at predicting growth, did not survive independent testing. In a paper that won the 2007 Marketing Science Institute / H. Paul Root Award, Keiningham, Cooil, Andreassen and Aksoy (Journal of Marketing, 2007) reanalysed longitudinal data and found NPS was not a clearly superior predictor of firm revenue growth versus conventional satisfaction measures like the American Customer Satisfaction Index. Reichheld himself later acknowledged a related problem: as NPS spread, it got gamed, staff begging for 10s, scores reported with no audit trail. His 2021 HBR piece "Net Promoter 3.0" proposed pairing the survey with a hard, accounting-based "earned growth rate" precisely because the self-reported score alone had become unreliable. So the limitation is plain: NPS is a useful directional proxy and a terrible target. The moment a team is paid on the number, the number stops measuring loyalty and starts measuring how hard people asked for a 10.

flowchart TD
    Q(["How likely to recommend? 0–10"]) --> D(["0–6: Detractors"])
    Q --> P(["7–8: Passives"])
    Q --> R(["9–10: Promoters"])
    D --> S(["NPS = %Promoters − %Detractors"])
    R --> S
    S --> W(["The 'why?' comment is where the action is"])
					
How NPS is built, and where the value actually sits: the open comment, not the headline score. Leaders Loop

CES, measuring effort, because friction drives people away

Customer Effort Score came out of a counter-intuitive finding. In their 2010 HBR article "Stop Trying to Delight Your Customers," Matthew Dixon, Karen Freeman and Nicholas Toman of CEB (now Gartner) reported a study of more than 75,000 customer interactions and concluded that delighting customers barely moved loyalty, but reducing effort strongly reduced disloyalty. People don't reward you much for going above and beyond on a service issue; they punish you for making them work. So CES asks, after a service interaction, how easy the company made it to get their issue resolved.

The metric was later sharpened. The original "how much effort did you have to put in?" framing was replaced by CES 2.0: an agree/disagree statement, "The company made it easy for me to handle my issue," on a 7-point scale (strongly disagree to strongly agree). CEB argued the company-responsibility wording held up better across industries and languages. Put it where the friction lives, support tickets, returns, cancellations, sign-up, and read a low score as a queue of things to remove, not feelings to soothe. Where NPS asks "do you love us?", CES asks "did we get out of your way?" For a service journey, the second question is the one you can act on this week.

"All customers really want is a simple, quick solution to their problem.", Dixon, Freeman & Toman, HBR (2010)

The limitation that ties all three together: each is a single-item survey of people who chose to respond, which makes them vulnerable to non-response bias (the indifferent rarely answer) and to gaming the instant they become incentives. They are proxies, triangulated best against hard behaviour, renewals, repeat purchase, churn, not trusted in isolation. This is also why a serious voice-of-customer program never rests on one number: the score flags where to look, and the verbatim comments and follow-up research tell you what is actually wrong.

A worked example

Imagine a mid-market B2B software company whose board sees one slide: "NPS: 42, up 3 points." Everyone nods. Renewals, meanwhile, are quietly softening. The new head of customer insight refuses the single number and splits the metrics by the question each is built to answer. (Figures below are illustrative, to show the reasoning, not real data.)

flowchart LR
    A(["Onboarding done"]) --> CS(["CSAT: 4.6 / 5 ✓"])
    B(["Support ticket closed"]) --> CE(["CES 2.0: 3.1 / 7 ✗ high effort"])
    C(["Whole relationship, quarterly"]) --> NP(["NPS: 42, trending down in mid-tier"])
					
Three metrics, three questions. The healthy CSAT hid a painful support journey that the relationship NPS was slowly absorbing. Leaders Loop

The picture changes immediately. CSAT on onboarding is excellent (4.6/5), the first impression is fine, so the leak isn't there. CES on support, though, is poor: customers strongly disagree that the company made resolving issues easy (3.1 on the 7-point CES 2.0 scale). And when she segments NPS rather than averaging it, the headline 42 is propped up by enthusiastic enterprise accounts, while mid-tier customers have slid into detractor territory, the same customers filing the high-effort support tickets.

Now the metrics point somewhere. The story isn't "satisfaction is great" (the CSAT trap) or "loyalty is up" (the NPS-average trap). It's that a high-effort support experience is eroding the mid-market relationship, and the rolled-up NPS was hiding it. The next step writes itself: read the CES verbatims for the top three effort drivers (probably repeat contacts and channel-switching, the exact patterns Dixon's team named), fix those journeys, and watch mid-tier CES and NPS together over the next two quarters, against renewals, the behaviour that actually pays the bills. One number told a comforting story. Three questions told the true one.

Frequently asked questions

Which metric should I actually use?

Match the metric to the decision. Use CSAT to monitor specific touchpoints (post-onboarding, post-delivery). Use CES to find and remove friction in service and self-service journeys. Use NPS as a periodic, relationship-level loyalty trend for the whole base. Many teams run CSAT and CES transactionally and NPS relationally, they answer different questions, so they coexist rather than compete.

Is NPS still worth measuring given the criticism?

Yes, with eyes open. The research (Keiningham et al., 2007) only debunked the claim that NPS is uniquely superior at predicting growth, not that it's useless. As a simple, comparable, directional signal with a strong follow-up comment, it earns its place. The failure mode is turning it into a bonus target, which invites the gaming that Reichheld's own "Net Promoter 3.0" (2021) tries to fix by pairing it with hard accounting data.

What's a "good" score?

Benchmarks vary so much by industry, culture and survey method that chasing a universal target is a mistake, NPS in particular skews by country and category. The honest answer is to benchmark against your own trend and your nearest competitors, and to care more about the direction and the reasons than the absolute figure. A rising trend you understand beats a high number you can't explain.

Why does CES predict loyalty better than satisfaction in service?

Because Dixon, Freeman and Toman's 75,000-interaction study (HBR, 2010) found that exceeding expectations on a service issue barely lifted loyalty, while making customers work hard reliably destroyed it. Loyalty in service is mostly about avoiding a bad, high-effort experience, so a metric built on effort tracks the thing that actually moves people out the door.

How do I stop these scores being gamed?

Don't pay people directly on the survey number. Once a score is an individual's bonus target, you measure the begging, not the loyalty. Keep the metric as a diagnostic, audit how surveys are sent, and triangulate against hard behaviour, renewals, repeat purchase, churn, which is far harder to fake than a single survey response.

Related in the Toolkit

Where to go next