AI Engineering6 min readMarch 12, 2025

Cutting LLM API Costs by 70% Without Hurting Answer Quality

Most LLM cost problems aren't about the model being expensive. They're about paying full price for calls that never needed the full model.

Tanjil Ahmed

Lead Software Engineer · Notionhive

An LLM bill that looks alarming almost always has the same root cause: every request, regardless of complexity, is going to the largest and most expensive model available. Cost optimization that doesn't hurt quality starts with routing, not with switching everything to a cheaper model and hoping.

Route by task complexity: classification and extraction to a small fast model, open-ended reasoning to the frontier model.
Cache identical or near-identical prompts aggressively — a surprising share of production traffic repeats.
Prompt caching (where the provider supports it) turns a long, static system prompt from a per-request cost into a one-time cost.
Trim retrieved context to what's actually relevant — sending ten chunks when three would answer the question is pure waste.

The 70% reduction I've seen on real projects came almost entirely from routing and caching, not from degrading the model used for the hard 20% of requests that actually need it. Quality held because the expensive model kept doing the job only it could do.

Most LLM cost problems are a routing problem wearing a pricing problem's clothes.