Personality models: the Big Five, and the caveats of MBTI

Hand a manager a four-letter personality type and they will quietly file every person on the team into a box. The trouble is that the most popular box-maker, the one printed on the lanyard at the offsite, is the one personality scientists trust least. There is a better model, and it is not much harder to use.

The quick version

The Big Five (remembered as OCEAN: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) is the model researchers actually use. It rates each trait on a spectrum, not a category.
MBTI (the 16 four-letter types) is engaging and popular, but it splits people into either/or boxes and a large share get a different result when they re-take it weeks later.
Personality does predict things at work, conscientiousness, in particular, relates to job performance across almost every role, but the effects are real-but-modest, not destiny.
Use any model as a conversation starter about how people prefer to work, never as a label that decides who gets which job, project or promotion.

The idea in depth: why the Big Five won

The Big Five did not arrive as one person's bright idea. It was assembled, slowly and stubbornly, from the words people use to describe each other. The starting point is the lexical hypothesis: if a personality difference matters enough, language will have invented a word for it. In 1936 Gordon Allport and Henry Odbert combed an English dictionary and pulled out thousands of trait words. Decades of statistical sifting, Tupes and Christal recovering five recurring factors in US Air Force data in 1961, Warren Norman replicating the structure in 1963, and Lewis Goldberg coining the term "Big Five" and reproducing it across languages in the 1980s, kept collapsing that vocabulary into the same five clusters (see Goldberg's overview on the Big Five). Paul Costa and Robert McCrae then built the measurement instrument that made it usable in research, the NEO-PI (1985) and its revision the NEO-PI-R (1992), which break each of the five traits into six narrower facets.

The crucial design choice is that the Big Five treats each trait as a dimension you sit somewhere along, not a type you either are or aren't. Nobody is "an extravert" full stop; you are more or less extraverted than the next person, by degree. The practical habit that follows: when you talk about a colleague's style, describe the dial, not the box. "She runs high on conscientiousness, she'll want the plan written down" is a usable, falsifiable observation. "She's a Type J" is a costume you've put on her.

flowchart TD
  A(["Everyday words
for describing people"]) --> B(["Allport & Odbert 1936
~18,000 trait words"])
  B --> C(["Statistical sifting
Tupes & Christal, Norman, Goldberg"])
  C --> D(["Five recurring factors
O · C · E · A · N"])
  D --> E(["Costa & McCrae
NEO-PI-R measures each on a scale"])

The Big Five was discovered, not invented, five clusters kept falling out of the language people use to describe each other. Leaders Loop

An honest limitation. The Big Five is descriptive, not explanatory. It reliably tells you what a person tends to do; it is far weaker on why. It also flattens culture and context, the same five factors show up across many languages, but a sixth dimension (Honesty-Humility, in the related HEXACO model) and culture-specific traits keep being proposed, so "five" is a strong consensus rather than a closed case. Treat it as the best-evidenced map we have, not the territory itself.

The idea in depth: where MBTI falls down

The Myers-Briggs Type Indicator is the friendliest personality tool ever made, and that is part of the problem. It sorts you on four either/or switches, Introversion/Extraversion, Sensing/iNtuition, Thinking/Feeling, Judging/Perceiving, and hands back one of 16 tidy types like INTJ or ESFP. People love it because a clean label feels like self-knowledge. Two specific, evidenced flaws should make a leader cautious about leaning on it for decisions.

First, reliability. In David Pittenger's review "Cautionary Comments Regarding the Myers-Briggs Type Indicator" (Consulting Psychology Journal, 2005), he reports that across a five-week re-test window, around half of people received a different classification on one or more of the four scales. A measuring tape that gives you a different answer for the same wall a month later is not measuring the wall. (The Myers-Briggs publisher reports stronger figures and disputes the framing, note that it is the instrument's owner, so weigh the source.)

Second, false dichotomies. MBTI forces a cut down the middle of traits that are actually bell-curved. Most people sit near the centre on, say, Thinking-versus-Feeling, and for them a point or two either side flips the whole letter, and therefore the whole type. As organisational psychologist Adam Grant put it bluntly in "Goodbye to MBTI, the Fad That Won't Die" (2013), on accuracy, "if you put a horoscope on one end and a heart monitor on the other, the MBTI falls about halfway in between." Here is the practical line to hold: enjoy MBTI as an icebreaker if your team likes it, but never let a four-letter type gate a hiring, staffing or promotion call. The thing you would be deciding on might not survive the person taking the test again.

A measuring tool that gives a different answer for the same person a month later isn't measuring the person.

flowchart LR
  A(["A real trait
(e.g. Thinking ↔ Feeling)"]) --> B{"How most
models read it"}
  B -->|"Big Five: a spectrum"| C(["Score: where you
sit on the dial"])
  B -->|"MBTI: a cut at the middle"| D(["Letter T or F,
flips if you're near centre"])
  D --> E(["Re-take weeks later:
letter may change"])

Same trait, two readings. A spectrum survives small wobbles; an either/or cut amplifies them into a new "type." Leaders Loop

What personality actually predicts at work

It would be a mistake to swing from "MBTI is unreliable" to "personality is useless." The evidence says the opposite, within limits. The landmark study is Murray Barrick and Michael Mount's meta-analysis "The Big Five Personality Dimensions and Job Performance" (Personnel Psychology, 1991). Pooling many studies across professionals, police, managers, sales and skilled workers, they found conscientiousness related consistently to performance in every occupational group, the closest thing to a universal signal personality research has. Extraversion predicted performance specifically in jobs built on social interaction, like sales and management; openness and extraversion both predicted success in training.

The honest framing matters as much as the headline. These are real but modest relationships, useful at the level of a hundred hires, not a guarantee for any single person in front of you. Personality nudges the odds; it does not set the outcome. So, in practice: if you use a trait measure in hiring, use a Big-Five-based, validated instrument, treat conscientiousness as a mild positive signal alongside skills and a real work sample, and never use it to screen people out on its own. A modest edge across many decisions is worth having; a verdict on one human being is not what the data can buy you.

A worked example

Take a team lead, call her Priya, who runs a four-person analytics squad. (Illustrative scenario; not real people or scores.) At an offsite, everyone does an online MBTI-style quiz. Priya comes out "ENTJ, the Commander," and her quietest analyst, Sam, comes out "ISFP." Priya's instinct is to hand Sam the heads-down data-cleaning and keep the client-facing work for the "E" types.

Run that through what we now know. The ENTJ/ISFP labels are exactly the either/or cuts Pittenger warns about, Sam may well sit just over the line on introversion, and re-take next month as an "E." Worse, Priya is about to make a real staffing decision on a label that flips. The Big-Five reframe is more useful and less risky: instead of a type, she notices Sam runs high on conscientiousness (the detailed work is flawless and on time) and lower on extraversion (he drains in big group settings, not in one-to-ones). That is a description of preferences, on dials, that Sam can confirm or correct.

flowchart TD
  A(["Sam scores 'ISFP'
on an MBTI-style quiz"]) --> B{"Decide staffing
on the label?"}
  B -->|"Type-thinking"| C(["Box him as 'not client-facing'
a label that may flip on re-test"])
  B -->|"Trait + conversation"| D(["High conscientiousness,
lower extraversion, on dials"])
  D --> E(["Ask Sam: small client demos OK?
Big rooms drain you?"])
  E --> F(["Pair him on a demo,
grow the range deliberately"])

The model's job is to start a better conversation with Sam, not to decide his role before he's spoken. Leaders Loop

The payoff is in what Priya does next. Rather than typecasting Sam, she asks him: would a small client demo suit you better than a big workshop? He says yes, and over a quarter she pairs him into client work in low-pressure settings, playing to the conscientiousness while widening the range. The model earned its keep as a prompt for a conversation, not a sorting hat. That is the whole game: personality data is a hypothesis to test with the person, never a sentence to pass on them.

Frequently asked questions

Is the Big Five definitely "better" than MBTI?

For making decisions and for research, yes, it is built on stronger evidence, rates traits on a spectrum, and is more stable when re-taken. MBTI isn't worthless as a self-reflection prompt or a team icebreaker if people enjoy it; the error is using its four-letter types to decide who does what. Match the tool to the stakes: low-stakes fun, fine; consequential people decisions, use the better-evidenced model.

Can someone's personality change?

Traits are fairly stable in adulthood but not fixed. On average people tend to drift toward more conscientiousness and emotional stability with age, and deliberate effort and big life changes can move the dials over time. The practical takeaway: don't treat a score as a life sentence, for yourself or anyone you manage. A "low" today is a starting point, not a ceiling.

Should I use a personality test to hire?

Cautiously, and never alone. The evidence (Barrick & Mount, 1991, onward) supports a validated, Big-Five-based measure as a modest signal, conscientiousness most reliably, used alongside skills assessments and a real work sample, not as a screen-out filter. Avoid MBTI-style typing for hiring entirely, and check your local employment rules: in some jurisdictions personality testing in selection carries legal constraints, so confirm with a qualified professional.

Why is MBTI still everywhere if scientists doubt it?

Because it feels good and reads easily. A clean four-letter type flatters you, gives a team a shared language fast, and asks nothing uncomfortable, every type is framed as a strength. That is excellent product design and weak measurement. Popularity is evidence of appeal, not of accuracy; the two are not the same thing.

Aren't all these labels just astrology for the office?

That critique lands hardest on type systems, less on the Big Five. The difference is testability: Big Five scores predict real outcomes (modestly) and hold up on re-test, whereas a system that hands you a new "type" next month and frames every result as flattering is closer to a horoscope. The honest position is in the middle, personality is real and partly measurable, but no single result should carry the weight people want to give it.

Related in the Toolkit

Personality sits next to the rest of how people actually think and decide, the same caution that stops you typecasting a colleague is really a guard against the mental shortcuts that turn a quick read of someone into a fixed verdict, and getting an honest fix on your own traits is the heart of reflective practice.

Motivation theory (Maslow, Herzberg, Self-Determination, intrinsic vs extrinsic), what actually moves people, once you stop reducing them to a type.
Cognitive biases (confirmation, availability, anchoring, halo, priming), the halo effect is exactly how a single label hardens into a person's whole reputation.
Dual-process thinking (System 1 / System 2), typing someone is a fast System-1 read; treating it as a hypothesis is the slower System-2 check.
Behavioural levers (FOMO, loss aversion, nudges, defaults), why a flattering, easy result like a "type" spreads so fast through a team.
Social psychology (conformity, authority, reciprocity, social proof), labels stick partly because everyone around you accepts them.
Self-awareness & reflective practice, the productive use of any personality model is on yourself, with honesty.
Conflict resolution & management styles (Thomas-Kilmann), trait differences explain a lot of why two people clash, and how to bridge it.
Managing up, down & across, adapting to how different people prefer to work is the everyday payoff of reading traits well.

Where to go next

"The Big Five Personality Dimensions and Job Performance", Barrick & Mount (1991), the meta-analysis that established conscientiousness as a cross-occupational predictor of performance; the evidence base for using traits at work.
"Cautionary Comments Regarding the MBTI", Pittenger (2005), the readable academic case against leaning on Myers-Briggs for decisions, including the re-test reliability problem.
"Goodbye to MBTI, the Fad That Won't Die", Adam Grant (2013), a leading organisational psychologist's plain-language critique, and his case for the Big Five instead.
"Big Five personality traits", overview with primary citations, a well-referenced map of the model's history (Allport & Odbert, Tupes & Christal, Goldberg, Costa & McCrae) and the open debates around it.
"Measuring Personality", CrashCourse Psychology #22 (YouTube), a clear ten-minute primer on how personality is measured, including the Big Five and the limits of type tests.