Ask a team why their onboarding is slow and you'll get a tidy story: not enough headcount, a clunky tool, a process nobody owns. Watch the same team do an onboarding and you'll see what the story left out, the workaround spreadsheet, the Slack message that does the real coordinating, the step everyone quietly skips. Your research keeps disappointing you not because you asked the wrong people, but because you trusted the story instead of the scene.
The quick version
- Interviews surface what people think and feel; ethnography (watching them in their real setting) surfaces what they actually do. You usually need both.
- People can't reliably report their own behaviour, the documented "say–do gap." So ask open grand-tour questions ("walk me through your last…"), then ladder with "why does that matter?" to reach the real motive.
- Contextual inquiry, observe-plus-interview in the person's own environment, gets you the richest signal for the least overhead.
- You're looking for the gap between the official process and the real one. That gap is where most leadership decisions actually get made.
The idea in depth: ask, but don't believe the answer
The foundational move in qualitative interviewing is to stop steering. James Spradley's The Ethnographic Interview (1979) built a whole grammar around this. His workhorse is the grand-tour question, "Could you walk me through a typical day?" or "Tell me everything that happened the last time you onboarded someone." The phrasing deliberately rambles, because a longer, looser question gives the other person room to think and tends to produce a longer, richer answer. You follow it with mini-tour questions ("you mentioned the handover, talk me through just that part") and example questions ("can you give me a specific time that went wrong?"). Spradley's point is that the interviewer's job is to make the respondent the expert and get out of the way.
So the move is: replace your leading questions with grand-tour ones. Not "Is the handover the bottleneck?" (you just handed them your answer) but "Walk me through the last handover, start to finish." Then shut up. Silence is a tool; most people fill it with the detail you actually needed.
Steinar Kvale, in InterViews (1996), offers a useful self-check with two metaphors. The miner treats knowledge as buried metal, dug out of the subject's head intact. The traveller treats the interview as a journey where meaning is built in the conversation. Kvale's warning: most of us interview like miners while pretending the nugget we carry home is untouched ore. It isn't, the way you asked helped make the answer.
flowchart LR
A("Grand-tour<br/>'Walk me through a typical…'") --> B("Mini-tour<br/>'Just the handover part…'")
B --> C("Example<br/>'A specific time it broke…'")
C --> D("Ladder<br/>'Why does that matter?'")
D --> E(["Motive / value<br/>the real 'so what'"])
Laddering: getting from a complaint to a cause
Open questions get you breadth. To get depth, to move from "the dashboard is clunky" to the thing that actually drives the person, there is a specific, well-evidenced technique called laddering. It grew out of means–end chain theory and was codified by Thomas Reynolds and Jonathan Gutman in their 1988 Journal of Advertising Research paper, "Laddering Theory, Method, Analysis, and Interpretation." The mechanic is almost comically simple: when someone names an attribute ("it's slow"), you ask why that matters. They give a consequence ("I lose the thread"). You ask why that matters. Eventually you hit a value they can't ladder past ("I look unprepared in front of my boss"). Reynolds and Gutman found most chains run three to six links, attribute → consequence → value.
So the move is: when you hit a feature-level complaint, ask "why is that a problem for you?" two or three more times than feels comfortable. The first answer is the symptom. The third or fourth is usually the decision-relevant truth, and it's almost never "the dashboard."
This pairs naturally with knowing what job people are hiring your process to do: laddering is how you climb from a stated feature request to the underlying job and the need beneath it.
An honest limitation. Laddering can manufacture depth that isn't there. Push hard enough and a cooperative person will invent a profound-sounding value to satisfy you, researchers call this the risk of leading the respondent up a ladder of your own making. Treat a laddered answer as a hypothesis to check against behaviour, not as a confession.
Ethnography: watch, because asking has a ceiling
Here's the real reason interviews alone aren't enough. There is a well-documented say–do gap: what people report doing and what they actually do routinely diverge. Not because they lie. Memory is lossy, habit is invisible to the person performing it, and most of us edit, unconsciously, toward the answer we think the interviewer wants. You cannot interview your way around a blind spot the subject shares.
Ethnography is the discipline of watching instead. Its root is anthropologist Clifford Geertz's idea of thick description, from The Interpretation of Cultures (1973): don't just record the behaviour (a person rapidly contracting an eyelid), interpret what it means in context (a wink, a twitch, a parody of a wink). The behaviour is thin; the meaning is thick, and meaning only shows up in the setting. In a workplace, that's the difference between "she copied the figure into a second spreadsheet" and "she doesn't trust the system of record, so she keeps a shadow copy." Same action, very different decision.
"What you most need to know is usually the thing nobody thought to mention, because to them it isn't worth mentioning."
You don't need a six-month field study. The practical, leader-sized version is contextual inquiry, developed by Hugh Beyer and Karen Holtzblatt (see Nielsen Norman Group's working definition): you sit with one person in their real environment, watch them do the actual task, and interview as they go, "what are you doing now? why that way?" It's a master–apprentice posture: they're the master at their own work, you're the apprentice asking the naive questions. A handful of these sessions out-performs a dozen conference-room interviews, because you're anchored to a real instance instead of a remembered average.
So the move is: for any process you're about to change, go watch three people actually do it, in situ, before you redesign anything. Narrate-aloud beats recall every time.
flowchart TD
Q("What you're trying to learn") --> D{"Attitude or<br/>behaviour?"}
D -->|"Beliefs, motives, why"| I("Interview<br/>grand-tour + laddering")
D -->|"What people actually do"| O("Observe<br/>contextual inquiry / fieldwork")
I --> G(["Triangulate:<br/>compare say vs do"])
O --> G
A worked example: the "training problem" that wasn't
A support team is missing its response-time target. The team lead's explanation is confident: agents need more product training. Leadership is one budget approval away from commissioning a training programme.
Instead, a manager runs the cheap version of this toolkit. First, three grand-tour interviews: "Walk me through the last ticket that blew the SLA." Every agent mentions, in passing, "I had to go check with Priya." That's a thread. Laddering: Why check with Priya? "She knows the refunds policy." Why not just look it up? "The wiki's out of date, last time I trusted it I gave a wrong answer." Why does that matter so much? "Because a wrong refund call gets escalated and I get named in the post-mortem." The ladder has moved from training to I don't trust the documentation and I'm scared of being blamed.
Then contextual inquiry: the manager sits beside two agents for an afternoon. The say–do gap appears immediately, nobody opens the training materials; they DM Priya and wait. The bottleneck isn't knowledge in heads, it's a single point of failure named Priya plus a wiki no one trusts. Illustrative figures, invented to make the shape concrete: roughly a third of breached tickets route through one person, costing Priya several hours a week re-answering the same questions. The fix isn't a training course. It's a trustworthy, owned policy reference plus a blame-free way to escalate. That's a different decision drawing on a different budget, and a much better one, and whether to commit to it is itself a question of how reversible the change is.
Frequently asked questions
How many people do I need to interview?
For qualitative discovery, fewer than you'd think, you're hunting for patterns and surprises, not statistical estimates. Practitioners commonly find that five to eight contextual sessions per distinct user type surface most of the recurring themes; you stop when new sessions stop telling you new things. If you need numbers you can defend to a board, that's a different tool entirely, see survey & sampling design.
Isn't watching people over their shoulder just creepy and biased?
Observation does change behaviour somewhat (the classic observer effect), and consent plus a low-key presence matters. But the bias in a remembered, edited interview answer is usually larger than the bias of being watched doing a real task. Be transparent, get permission, sit slightly to the side, and let people narrate. The trade is worth it.
What's the difference between a focus group and this?
A focus group is several people talking about a topic, efficient for reactions, but prone to groupthink and to the loudest voice, and still entirely in say-land. Ethnographic and contextual methods are one person, in their real setting, doing the real thing. Different question, different tool; don't substitute one for the other.
How do I stop leading the witness?
Audit your questions for hidden answers. "Don't you find the new tool faster?" contains its own reply. Convert each to a grand-tour or example form ("How does the new tool compare to the old one for you?"), embrace silence, and ask people to show you rather than tell you. When in doubt, ask "why?" and then wait.
When is this the wrong method?
When your question is "how many" or "how much," not "what" or "why." Qualitative methods tell you what's going on and generate hypotheses; they can't tell you how common it is. Pair them with quantitative work, that pairing is exactly the subject of qualitative vs quantitative vs mixed methods.
Related in the Toolkit
- Qualitative vs quantitative vs mixed methods, where interviewing and ethnography sit, and when to pair them with numbers.
- Survey & sampling design, the tool for "how many," once interviews have told you "what."
- Experiment design (RCTs, A/B testing, quasi-experiments), how to test the fix your interviews surfaced, instead of just believing it.
- Validity, reliability & bias in research, the say–do gap, leading questions, and observer effects, treated systematically.
- Jobs-to-be-Done & needs research, what laddering is climbing toward: the underlying job and need.
- First principles vs heuristics vs analogical reasoning, how to reason from the evidence once you have it.
- Reversible vs irreversible decisions, how much research a decision actually warrants.
- Descriptive statistics (mean, median, mode, variance, SD), for summarising any counts your fieldwork happens to throw off.
Where to go next
- Clifford Geertz, The Interpretation of Cultures (1973), the source of "thick description"; read the first essay to understand why meaning lives in context, not in the bare behaviour.
- Nielsen Norman Group, "Contextual Inquiry", the clearest short, practical primer on observe-plus-interview fieldwork for non-anthropologists.
- Reynolds & Gutman, "Laddering Theory, Method, Analysis, and Interpretation" (1988), the primary source for the laddering technique and the attribute–consequence–value chain.
- Tricia Wang, "The human insights missing from big data" (TED), a sharp talk on why "thick data" from real people catches what dashboards miss; her Nokia story is the say–do gap at corporate scale.
- Steinar Kvale, InterViews, the miner-versus-traveller framing and the seven-stage anatomy of a research interview, for anyone going deeper.