Watch a team run flat-out and you will see something strange: the busier they get, the slower work seems to come out the other end. Tickets pile up, "quick" requests sit for days, and everyone is exhausted. The problem is rarely effort. It is that the team has been planned to nearly 100% utilisation, and at that level, the queue does the damage.

The quick version

  • Capacity planning is matching the work you take on to the work your people and systems can actually do, over a given window, without quietly overloading either.
  • Resource planning is the allocation half: deciding who (or what) does which work, and keeping that assignment realistic rather than aspirational.
  • The counter-intuitive core: running close to 100% utilisation makes delivery slower, not faster. Wait times rise steeply as a resource nears full load, so deliberate spare capacity is a feature, not waste.
  • Total capacity is set by your bottleneck, not your average. Add people away from the constraint and nothing speeds up; you just grow the queue in front of it.

The idea in depth

Three pieces of well-established operations theory explain almost everything you will see in a real team. None of it is new, and all of it is verifiable.

Start with the simplest. Little's Law, formulated by MIT's John Little (who published the proof in 1961), states a relationship so basic it is almost embarrassing: the average amount of work in a system equals the rate work flows through it multiplied by the average time each item spends inside. In plain terms, work-in-progress = throughput × lead time. Rearrange it and you get the part leaders forget, lead time = work-in-progress ÷ throughput. If your throughput is fixed and you pour more work in, every item simply waits longer. You can read the law and an accessible explanation on Wikipedia's Little's Law page.

So the move is: stop measuring a team only by how much it has on its plate. Cap the amount of work in progress. The fastest way to make a backlog move is usually to start fewer things at once, not to start more, because lead time falls the moment work-in-progress does.

The second piece tells you why a fully-loaded team grinds to a halt. Kingman's formula (J.F.C. Kingman, "The single server queue in heavy traffic," 1961) approximates how long work waits before a busy resource. The shape that matters is the utilisation term, ρ/(1−ρ): as utilisation (ρ) climbs toward 1, that fraction does not rise gently, it explodes. At 50% utilisation the term is 1; at 80% it is 4; at 90% it is 9; at 95% it is 19. As ρ approaches 1 the denominator (1−ρ) heads for zero, so the predicted wait runs away to infinity, the behaviour the Kingman's formula entry sets out for a resource nearing saturation.

The same formula carries a second warning: queue time scales with variability as well as load. The more uneven your arrivals and task sizes, the longer the wait at any given utilisation, so a team doing predictable work can safely run hotter than one juggling unpredictable requests.

flowchart LR
  A(["Low load
~50% busy"]) --> B(["Queue barely grows
work flows out fast"]) C(["High load
~80% busy"]) --> D(["Wait time ~4× worse
backlog builds"]) E(["Near-full load
~95% busy"]) --> F(["Wait time ~19× worse
delivery stalls"])
Why "fully utilised" backfires: as a resource nears 100%, wait time rises far faster than load does (Kingman's ρ/(1−ρ) term). Leaders Loop

So the move is: plan to a target utilisation well below 100% for any team whose work is uneven or knowledge-heavy. Practitioners who apply queueing theory to teams often point to a tipping point around the 80% mark, beyond which responsiveness collapses, see this practitioner explainer on queueing, slack and resilience. The exact figure is a judgement call, not a constant, but the principle holds: the slack is the capacity to absorb surprises and keep flow steady.

The third piece tells you where to act. Eliyahu Goldratt's Theory of Constraints, set out in his 1984 business novel The Goal, makes one stubborn claim: every system has a single binding constraint, a bottleneck, and the output of the whole system is governed by it. An hour saved at the bottleneck is an hour gained for the entire flow; an hour saved anywhere else is a mirage. Goldratt's five focusing steps follow from this: identify the constraint, exploit it (wring everything you can from it before spending money), subordinate everything else to it, elevate it (add capacity there), then repeat, because once you relieve one bottleneck, another becomes binding. A clear summary sits at the LeanProduction Theory of Constraints guide.

The move here: before you hire or reassign anyone, find the one step that the work actually piles up in front of. Adding capacity anywhere else is, at best, wasted money, and at worst it feeds the bottleneck faster, lengthening the very queue you were trying to clear.

An honest limitation. These models assume you can see your flow, that work-in-progress, throughput and the bottleneck are measurable. In knowledge work they often are not: tasks vary enormously in size, "done" is fuzzy, and people split attention across several streams, which quietly multiplies switching costs the formulas don't capture. Kingman's formula is an approximation for a single server with tidy statistical assumptions, not a law of human teams. Treat all of this as a lens that tells you which way to lean, toward less work-in-progress, more slack, and attention on the constraint, not as a calculator that hands you a staffing number.

A worked example

Take a support-engineering team of five, call them the Platform pod. (Illustrative figures throughout; this is a teaching example, not a real team.) Each engineer is, on paper, "100% allocated": every hour of the week is booked to a project or a queue. Leadership is proud of it, nobody is idle. Yet first-response time on incidents has crept from hours to days, and two roadmap features are a month late.

Run it through the three lenses. Little's Law first: the pod closes roughly 40 tickets a week (throughput) but is carrying about 120 open at once (work-in-progress). Lead time = 120 ÷ 40 = three weeks. No amount of encouragement changes that arithmetic; only fewer open items or higher throughput will.

Kingman next: because every engineer is booked to the hilt, any unplanned incident, and incidents are pure variability, has no spare capacity to land on, so it waits behind planned work. The team isn't slow; it's saturated. Pull planned allocation back to around 80%, and that reserve absorbs the spikes instead of queueing them.

Theory of Constraints last: work always stalls in the same place, the single senior engineer who must review every production change. That review step is the bottleneck, so hiring a sixth junior elsewhere would do nothing. Exploiting the constraint (batching reviews, stripping trivial approvals from her plate) and then elevating it (training a second reviewer) is what lifts the whole pod's output.

flowchart TD
  A(["120 tickets open
throughput 40/wk"]) --> B{"Where does work
pile up?"} B -->|"At the single reviewer"| C(["Bottleneck found:
change review"]) C --> D(["Exploit: batch reviews,
strip trivial approvals"]) C --> E(["Elevate: train a
second reviewer"]) D --> F(["Cap WIP + plan to ~80%
lead time falls"]) E --> F
The fix isn't more people everywhere, it's less work-in-progress, deliberate slack, and capacity added at the constraint. Leaders Loop

Notice what the plan did not do: it did not ask everyone to try harder, and it did not add headcount at random. It capped work-in-progress, built in slack, and put effort where the constraint actually was. That is capacity planning doing its job.

A team planned to 100% utilisation has no capacity left for the one thing guaranteed to happen: the unexpected.

Frequently asked questions

Isn't planning a team below 100% just paying people to be idle?

No, and this is the most common objection. The slack isn't idleness; it's the buffer that lets the team absorb variation without the queue exploding. Kingman's formula shows wait time rising steeply above roughly 80% utilisation, so the "spare" capacity is what keeps everything else flowing on time. A team with zero slack delivers less usable output, not more, because so much of its throughput is consumed by delay and firefighting.

What's the difference between capacity planning and resource planning?

Capacity planning asks "how much work can we take on?", it sizes the demand against the supply over a window. Resource planning is the allocation that follows: who or what does which piece of that work. You need both: a realistic capacity number stops you over-committing, and sound resource planning makes sure the commitment lands on the right people rather than piling onto whoever is most senior.

How do I actually find my bottleneck?

Look for where work waits, not where people are busy, they're different. Trace a typical item through your process and note where it sits in a queue the longest; that step is almost always the constraint. In Goldratt's terms, you exploit it first (get more out of what you have) before you spend money elevating it. Then expect a new bottleneck to appear elsewhere, because relieving one always promotes the next.

Does any of this apply to knowledge work, or just factories?

The theory came from manufacturing, but the maths is about queues, and knowledge work is full of them, code reviews, approvals, sign-offs, anything one person must do before the next can proceed. The caveat is that knowledge tasks vary wildly in size and "done" is fuzzy, so treat the models as direction rather than precise prediction. Capping work-in-progress and protecting slack help even when you can't measure throughput to a decimal place.

How often should we re-plan capacity?

Often enough that the plan reflects reality, rarely enough that re-planning isn't its own full-time job. Tie it to your delivery rhythm, each sprint, PI, or quarter, and revisit sooner if the bottleneck moves or demand shifts sharply. A capacity plan is a forecast, not a contract; its value is in being updated as you learn, not in being defended once it's set.

Related in the Toolkit

Capacity planning lives inside how you deliver: the cadence you choose (delivery methodologies) decides how often you re-plan, and the rituals where you set the next slice of work (sprint / PI / OKR planning) are where capacity stops being a spreadsheet and becomes commitments.

Where to go next