A team lead pastes a dashboard into a board deck: "Average customer satisfaction: 4.1 out of 5, up from 3.8." It looks like progress. It might be. But "satisfaction on a 1–5 scale" isn't the same kind of number as revenue or headcount, and the move that's safe for one is quietly misleading for the other. The fix isn't more maths. It's knowing what type of data you're holding before you do anything to it.
The quick version
- Categorical data names things (region, team, yes/no). You can count it, not average it.
- Ordinal data ranks things (satisfaction levels, seniority bands). Order is real; the gaps between ranks are not.
- Numeric data measures things, and splits two ways: discrete (counts, tickets, sign-ups) and continuous (anything on a sliding scale, time, weight, money).
- The type decides which summary, chart and test is honest. Pick the wrong one and the chart still renders, it's just wrong.
The idea in depth: four types, two questions
The modern habit of sorting data into levels traces to one short paper. In 1946, Harvard psychologist S. S. Stevens published On the Theory of Scales of Measurement in Science, proposing four levels, nominal, ordinal, interval and ratio, each defined by what you're allowed to do to the numbers without distorting their meaning (Stevens, Science, vol. 103, 7 June 1946, pp. 677–680). Almost every "types of data" diagram you'll ever see is a repackaging of Stevens.
In everyday work the four collapse into a cleaner pair of questions. First: is the value a label or a measurement? A label is categorical (Stevens' nominal), "EMEA", "churned", "Tier 2". You can tally how many fall in each bucket; you cannot meaningfully average them. The "average region" is a category error, literally. Second, if it ranks: are the gaps equal? When categories have a real order but the spacing between them is unknown, "dissatisfied / neutral / satisfied", "junior / mid / senior", that's ordinal. You know satisfied beats neutral; you do not know it beats it by the same amount neutral beats dissatisfied.
So the move is: before you summarise a column, say out loud what one value means. If it's a name, count it and show proportions. If it's a rank, report the median or the distribution, and resist the average. (For a refresher on which summary fits which data, see descriptive statistics.)
flowchart TD
A(["A value in your column"]) --> B{"Is it a label or a measurement?"}
B -->|"Label / name"| C(["Categorical, count it, show proportions"])
B -->|"Measurement / rank"| D{"Are the gaps between values equal?"}
D -->|"No, only the order is real"| E(["Ordinal, use the median, not the mean"])
D -->|"Yes, equal, true numbers"| F{"Can it take any value in between?"}
F -->|"Only whole counts"| G(["Discrete, counts of things"])
F -->|"Any value on a scale"| H(["Continuous, measured on a continuum"])
The idea in depth: discrete vs continuous, and why it changes the chart
Once you're in genuine number territory, Stevens' interval and ratio levels, one more split matters operationally. Discrete data is counted and lands on whole values: number of support tickets, sign-ups, defects, people in a room. You can't log 3.7 tickets. Continuous data is measured and can sit anywhere on a scale: time-to-resolution, revenue, page-load latency, weight. A delivery can take 67.5 minutes as easily as 67. (Stevens' four levels, and the academics who've since picked holes in them, are mapped at Wikipedia's level-of-measurement overview.)
This isn't pedantry, it decides how you should see the data. Discrete counts belong in bar charts and are summarised by totals and counts; continuous measures belong in histograms or box plots and live comfortably with means, medians and standard deviations. Bin a continuous variable into a bar chart and you smear away the shape that would have told you something. Treat a count as continuous and you'll quote a "0.4 of a defect" that no one can act on.
The chart still renders when you pick the wrong data type. It's just quietly wrong, and confident.
flowchart LR
A(["Categorical"]) --> A1(["Mode · counts · bar chart"])
B(["Ordinal"]) --> B1(["Median · distribution · ranked bars"])
C(["Discrete numeric"]) --> C1(["Counts · totals · bar / line"])
D(["Continuous numeric"]) --> D1(["Mean · median · SD · histogram"])
The idea in depth: where the tidy story breaks down
Here's the honest limitation, and it's the part most explainers skip. Stevens' four-level scheme is a useful lens, not a law, and serious statisticians have argued it is sometimes actively misleading. In 1993, Paul Velleman and Leland Wilkinson published Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading in The American Statistician (vol. 47, no. 1, pp. 65–72), arguing that letting the "level" dictate which statistic you may use is too rigid: the same numbers can play different roles depending on the question.
The sharpest everyday case is the survey scale. A 1–5 satisfaction rating is, strictly, ordinal, yet teams average it constantly. Purists call that meaningless because the gap from 4 to 5 may not equal the gap from 2 to 3. Pragmatists, including usability researcher Jeff Sauro, counter that averaging ordinal ratings is widely useful and rarely changes the practical conclusion, provided you don't then make interval claims ("twice as satisfied") the data can't support (MeasuringU, 2016). Both camps are partly right. That tension is the whole reason this is worth knowing.
So the move is: treat the data type as a default, not a cage. Average a Likert scale if it helps you compare and you'll report the spread alongside it, but never let the average imply equal gaps, and never say "4.1 is 8% happier than 3.8". Lead a forced ranking and you're holding ordinal data: a median and the distribution will tell a truer story than a mean. When the stakes are high, this is exactly where correlation-versus-causation mistakes creep in too, a misread data type and a misread relationship compound.
A worked example
A regional operations manager wants to know whether a new onboarding flow is working. She has four columns (figures below are illustrative):
- Region, "North", "South", "Coastal". Categorical. She counts customers per region and shows proportions; there is no "average region".
- Onboarding rating, 1 (poor) to 5 (excellent). Ordinal. The old flow's median is 3; the new flow's is 4, a real, reportable shift. She also notes that ratings of 1–2 dropped from 30% to 12%, which is more persuasive than any single average.
- Support tickets in week one, 0, 1, 2, 3… Discrete. Average tickets per new customer fell from 1.8 to 1.1. A count can have a meaningful average even though no one files 1.1 tickets.
- Time to first value (hours), 0.5, 11.2, 27.75… Continuous. Median time dropped from 26 hours to 9. She uses median, not mean, because a handful of stalled accounts drag the average upward.
She doesn't tell her director "satisfaction is up 0.3". She says: ratings improved (median 3 → 4), the unhappy tail shrank, support load fell, and customers reach value three times faster. Each claim leans on the summary its data type actually permits. That's why no one in the room can poke a hole in it.
Frequently asked questions
Can I ever take the average of a 1–5 rating?
You can, and most teams do, but know you're bending a rule. Strictly the scale is ordinal, so the mean assumes gaps it can't guarantee. Average it to compare groups if you like, but report the median or distribution alongside it, and never make "twice as happy" claims. (See Sauro, 2016.)
What's the quickest way to tell discrete from continuous?
Ask whether a half-value makes sense. Half a support ticket is nonsense (discrete); half a second is fine (continuous). Counts are discrete; measurements on a scale are continuous.
Is ordinal data categorical or numeric?
It sits in between, and that in-between status is what trips people up. It has the labels of categorical data but the order of numeric data, and the equal gaps of neither. Treat the order as real and the spacing as unknown.
Does my BI tool already handle this?
Partly. Tools infer types from the column (text vs number) but can't know that a 1–5 number is really a rank, not a measurement. They'll happily average it. The judgement is still yours.
Why does any of this matter to a leader, not just an analyst?
Because you sign off on the numbers. Most "the data says…" mistakes that reach a board aren't computation errors, they're type errors: a rank averaged like a measurement, a category treated as a number. Naming the type is the cheapest guardrail you have.
Related in the Toolkit
- Descriptive statistics (mean, median, mode, variance, SD), which summary each data type actually permits.
- Distributions, percentiles & quartiles, the right way to describe continuous data with a skewed tail.
- Correlation vs causation, where misread data types quietly become misread relationships.
- Regression (linear, non-linear, logistic), the model you reach for changes when the outcome is a category, a rank or a continuous number.
- Statistical significance: p-values, t-scores, chi-square, the test you're allowed to run depends on the data type going in.
- First principles vs heuristics vs analogical reasoning, "name the type before you compute" is a first-principles habit.
- Reversible vs irreversible decisions, how much rigour a data read deserves depends on the stakes of the call.
- Jobs-to-be-Done & needs research, most of the ordinal and categorical data you collect comes from customer research.
Where to go next
- S. S. Stevens, "On the Theory of Scales of Measurement" (Science, 1946), the original four-page paper that started the whole nominal/ordinal/interval/ratio scheme. Short and surprisingly readable.
- Velleman & Wilkinson, "Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading" (The American Statistician, 1993), the essential counter-argument: why the levels are a guide, not a rulebook.
- "Level of measurement" (Wikipedia), a well-sourced map of all four levels, the operations each permits, and the academic critics (Luce, Mosteller & Tukey, Chrisman).
- "Nominal, Ordinal, Interval & Ratio Data: Simple Explanation With Examples" (Grad Coach, YouTube), a clear ten-minute walk-through if you'd rather watch than read.