Designing for AI: transparency, explainability and human override

A loan officer is shown a model's recommendation to decline an applicant, with a confidence score of 91%. She doesn't know what drove the number, she can't see what the model missed, and the queue behind her is forty people deep. What does she do? In almost every case, she clicks decline. The interface didn't help her decide, it told her what to think, hid its reasoning, and made disagreeing feel like extra work. That is a design failure, not a model failure, and it is the failure most teams ship.

The quick version

Designing for AI is mostly about calibrated trust: helping users know when to rely on the system and when to use their own judgement, not maximising trust, and not minimising it.
Transparency, explainability and override are the three controls that calibrate it: show what the system is and how sure it is; explain enough to act on; and always leave a real, low-friction way for a human to take back the wheel.
The two failure modes are over-reliance (rubber-stamping the machine) and aversion (ignoring a system that's usually right). Good design steers between them.
Explanations have limits, many models can't be fully explained, and a bad explanation can manufacture false confidence. Honest beats complete.

The idea in depth

Start with the goal, because almost everyone gets it wrong. The job is not to make users trust the AI. Google's People + AI Guidebook, produced by its PAIR research group, puts it bluntly: “the user shouldn't trust the system completely. Rather, based on system explanations, the user should know when to trust the system's predictions and when to apply their own judgement.” The target is calibrated trust, a user whose reliance rises and falls with the system's actual reliability. The Guidebook breaks trust into three ingredients worth designing for separately: ability (is it competent at this task?), reliability (is it consistent?), and benevolence (is it acting in my interest?).

The evidence for this predates the current wave of AI. In their 1997 review Humans and Automation: Use, Misuse, Disuse, Abuse (Parasuraman & Riley, Human Factors), Raja Parasuraman and Victor Riley named the two ways trust goes wrong. Misuse is over-reliance: the operator stops monitoring and defers to the automation even when it's failing, later called automation bias. Disuse is the opposite: a system that cries wolf gets switched off, so a tool that's right 95% of the time is treated as if it's right zero. Neither is fixed by a better model; both are fixed by an interface that keeps the human appropriately in or out of the loop. So change the question. Stop asking “how do we get people to trust this?” and start asking where it will be over-trusted, where it will be ignored, and what in the design pulls reliance back toward the system's real accuracy.

flowchart LR
    A(["System reliability
(how often it's right)"]) --> C(["Calibrated trust
rely when right, doubt when not"])
    B(["What the design shows
state · confidence · explanation · override"]) --> C
    C --> D(["Over-reliance
(rubber-stamp the machine)"])
    C --> E(["Aversion / disuse
(ignore a useful system)"])
    D -.->|design pulls back| C
    E -.->|design pulls back| C

Calibrated trust is the target between two failure modes; the interface is the lever that pulls reliance back toward real reliability. Leaders Loop

The most practical codification of how to do this is Microsoft's Guidelines for Human-AI Interaction (Amershi et al., CHI 2019), eighteen design rules synthesised from two decades of research and validated through a structured user study. Their power is the staging: the guidelines are organised by when in the experience they apply. Initially, make clear what the system can do and how well it does it, so users don't form a fantasy of its competence. During interaction, show contextually relevant information and convey why the system did what it did. When wrong, support efficient correction, dismissal and appeal. Over time, learn from behaviour and notify users when you update, so their mental model doesn't silently go stale. Audit your feature against those four moments, first run, normal use, the moment it errs, the moment it changes. Most products pour effort into the happy path and almost nothing into the other three, which is exactly where trust is won or lost.

An honest limitation belongs here, because explainability is oversold. For many modern models the genuine reasoning behind a single output is, as the People + AI Guidebook concedes, “unknown or too complex to be summarized into a simple sentence.” What ships as an “explanation” is often a plausible-sounding rationalisation, and a confident-looking one can increase misplaced trust rather than calibrate it. The fix isn't more explanation; it's the right explanation, tested. Prefer partial explanations that say what the system actually used and leave out what it can't honestly account for. An explanation you can't stand behind is worse than admitting you don't fully know.

The three controls, and where each one earns its place

Transparency is the cheapest and most neglected. It means being plain that the user is dealing with an AI, what it's for, what it isn't for, and how confident it is right now. In high-stakes settings this isn't only good practice, it's becoming law. The EU AI Act requires high-risk systems to be designed so their operation is “sufficiently transparent to enable deployers to interpret a system's output and use it appropriately” (Article 13). The design read-through: confidence isn't decoration. A 60% suggestion and a 95% suggestion should look and behave differently, and a low-confidence output should invite scrutiny rather than a reflexive click.

The job isn't to make people trust the AI. It's to make them right about when to.

Explainability is transparency's harder sibling: not just that the system is unsure, but what drove this particular output, in terms the user can act on. The test is brutally practical, does the explanation change what a competent user does next? “Declined because debt-to-income exceeds the threshold and the last two payments were late” lets the loan officer sanity-check against facts she can see. “Declined (confidence 91%)” lets her do nothing but comply. This is the same reasoning that runs through algorithmic bias, explainability & model risk: an output you can't interrogate is one you can't govern. Write the explanation as the sentence the user would need to defend or overturn the decision to a colleague. If it doesn't hand them that, it isn't an explanation, it's a label.

Human override is the one teams quietly hollow out under deadline pressure. A real override is more than a greyed-out “are you sure?” The EU AI Act's Article 14 on human oversight sets a useful bar even outside its legal scope: operators must be able to fully understand the system's capacities and limitations, correctly interpret its output, decide not to use it in a given case, and intervene or halt it. The catch researchers keep finding is that override has to be easy to be real, if disagreeing with the machine is slower or riskier for the user than going along with it, automation bias quietly wins and your “human in the loop” becomes a human rubber stamp. So the bar is concrete: make disagreement at least as fast as agreement. One click to override, a quick reason capture, and no penalty for the override turning out to be the right call.

flowchart TD
    O(["AI produces an output"]) --> C{"Confidence high
and stakes low?"}
    C -->|yes| A(["Auto-apply
but log it + allow undo"])
    C -->|no| H(["Surface to a human
with a plain explanation"])
    H --> D{"Does the human
agree?"}
    D -->|agree| P(["Proceed
(decision recorded)"])
    D -->|override| R(["Act on the human's call
capture reason → feed back"])
    R -.->|improves the model| O

A workable pattern: automate the easy, confident cases; route the rest to a human with explanation and a frictionless override. Leaders Loop

A worked example

Picture a mid-sized health insurer, illustrative, not a real company, rolling out an AI triage assistant that flags claims as likely-approve, likely-decline, or needs-review. Version one shows assessors a single verdict and a confidence percentage, nothing else. Within a month, one healthy-looking metric hides two problems: 94% of the model's “likely-decline” flags are confirmed by assessors. Leadership reads that as the model being excellent. It isn't only that.

A spot review (illustrative figures) finds two things. First, on the cases where the model was actually wrong, assessors confirmed the wrong call roughly two times in three, textbook over-reliance, because the interface gave them nothing to push back with and the queue rewarded speed. Second, the new joiners with the worst error rates leaned hardest on the verdict, while veterans who quietly distrusted the tool were rubber-stamping it just to clear their queue. Over-reliance and aversion, in the same room, masked by one flattering number.

Version two changes the design, not the model. Each flag now shows the two or three factors that drove it, against data the assessor can verify. Confidence is banded, high, medium, low, and low-confidence claims route straight to needs-review rather than offering a tempting verdict. Override is one click with a reason, fed back to the model team weekly. The model's raw accuracy barely moves. But the assessors' combined accuracy climbs, because they now catch the model's mistakes instead of co-signing them, and the veterans start using it again, because a system that shows its reasoning earned back the trust a black box had spent. The through-line of the whole topic: you didn't need a better model. You needed a design that let a human be right about when to trust it.

Frequently asked questions

Isn't more transparency always better?

No. Dumping the full feature weights or a wall of model internals on a user is the same as showing nothing, it doesn't change what they do. The People + AI Guidebook argues for partial explanations that surface what's useful and honestly omit what isn't. The bar is usefulness, not completeness: show the minimum that lets the user make a better call.

What's the difference between transparency and explainability?

Transparency is disclosure about the system, that it's AI, what it's for, how confident it is. Explainability is rationale about a specific output, why this result. You can be transparent (“this is an AI, 80% confident”) without being explainable (no idea why), and an explanation is what turns a confidence number into something the user can act on.

If we add a human override, are we covered legally and ethically?

Only if the override is real. An override the user can't practically use, too slow, too risky, or quietly discouraged, is decorative, and both the research on automation bias and the EU AI Act's Article 14 framing treat genuine ability to intervene as the test. Measure how often humans actually override and whether their overrides are right; if it's near zero, you have a rubber stamp, not oversight. Treat anything in a regulated domain as needing qualified legal review for your jurisdiction.

Won't showing confidence scores just confuse people?

Raw percentages often do, the Guidebook warns confidence displays can be misinterpreted. The fix is to design the signal, not just expose the number: band it (high/medium/low), tie it to a behaviour (low confidence routes to review), and test that it actually improves decisions rather than assuming it will.

How do we know if our trust calibration is working?

Stop tracking agreement rate on its own, a high one can mean either a great model or a passive human, and you can't tell which. Track override rate, the accuracy of overrides, and the human-plus-AI accuracy versus each alone. If the team is no better than the model by itself, the human is a bottleneck, not a safeguard. See metric design for choosing measures that don't flatter you.

Related in the Toolkit

Human-centred design & empathy, calibrated trust starts with the user's real context: the queue, the stakes, the cost of being wrong.
Design thinking & the double diamond, the diverge/converge habit that keeps you designing the AI experience, not just the model.
Ideation & co-creation techniques (design studios, affinity mapping, card sorting, crazy-8s), for exploring explanation and override patterns before you build one.
Design sprints, a fast way to test whether an explanation actually changes what users do.
Information architecture, structuring confidence, rationale and controls so users find the override when it matters.
Customer needs identification & latent needs, the unspoken need behind any AI feature is “tell me when to doubt you.”
Design systems & style guides, codify your confidence, explanation and override patterns so every team ships them consistently.
Sales process & pipeline management, where AI scoring meets a human judgement call, the override design decides whether reps trust the lead scores.

Where to go next

Explainability + Trust, People + AI Guidebook (Google PAIR), the most practical, example-rich treatment of calibrated trust, partial explanations and confidence displays. Start here.
Guidelines for Human-AI Interaction, Amershi et al., CHI 2019, the eighteen guidelines, staged across first use, normal use, errors and over time; the checklist to audit your feature against.
Guidelines for Human-AI Interaction, Dr Saleema Amershi (talk), the lead author walking through the why and how; the best single thing to share with a team before you design.
Humans and Automation: Use, Misuse, Disuse, Abuse, Parasuraman & Riley (1997), the foundational research on over-reliance and disuse; old, but the mechanism hasn't changed.
EU AI Act, Article 14: Human Oversight, a clear, free articulation of what “real human override” has to include, useful well beyond its legal scope.