tanjilahmed87@gmail.com

Backend5 min read

Rate Limiting APIs Without Punishing Your Best Customers

A flat rate limit treats your biggest customer the same as a scraper. The limiting strategies that actually work are more deliberate than that.

Tanjil Ahmed

Lead Software Engineer · Notionhive

A single global rate limit is the easiest thing to ship and the fastest way to have your highest-value customer open a support ticket about getting throttled during their busiest hour. Rate limiting that actually protects the system without punishing legitimate heavy use needs a bit more nuance.

  • Tier limits by customer plan or authenticated identity, not by a single blanket number for every consumer.
  • Token bucket algorithms allow legitimate bursts (a customer importing data) while still capping sustained abuse.
  • Return clear `Retry-After` headers and a documented limit — a client that knows the rule can build around it gracefully.
  • Rate limit by endpoint cost, not just by request count — a heavy report-generation endpoint and a simple GET aren't equivalent.

The goal of rate limiting isn't to slow everyone down equally — it's to protect the system from abuse while staying invisible to the customers actually using the API as intended. Those are different problems, and treating them as the same one is how good customers get throttled.

A rate limit that treats your biggest customer like a scraper isn't protecting your API. It's taxing your best users.