Most labs ship a family at three rough tiers — a small fast one, a balanced one, and a flagship — so you can match capability to the job and the budget.
Nearly every lab ships a tiered family rather than one model. Anthropic has Haiku / Sonnet / Opus; OpenAI has nano / mini / full; Google has Flash / Pro. The pattern is the same: a small, fast, cheap tier for high volume; a flagship for the hardest reasoning; and a balanced middle that's the right default for most production work.
The small tier is the workhorse most teams underuse. For chat, classification, extraction and routing, a Haiku- or Flash-class model is often 5–20× cheaper and several times faster than the flagship, at quality the task can't actually distinguish. Reach for the flagship only where reasoning genuinely gates the outcome.
A common production pattern is to route by difficulty — cheap small model first, escalate to the flagship only when confidence is low — which captures most of the quality at a fraction of the cost.