Register for our API Month Tech Talk on April 18! Explore the latest in API trends, security, and more. Register Now
Back to Kubernetes Glossary

Rate Limiting

What is Rate Limiting?

Rate limiting is a technique used to control the amount of incoming and outgoing traffic to or from a network, or the amount of data being queried. For example, if an API becomes very popular, the spike in traffic can significantly slow down response time. Rate limiting is often used as a security tool to prevent denial of service (DDoS) attacks, which intentionally overburden a server with requests. Rate limiting is also important for scaling an API. Similar to how containers and microservices scale, APIs must be able to handle a growing and evolving user base.

There are different kinds of rate limits for APIs:

  • Backend rate limiting, which controls the physical transference of data from server to server, depending on which server handles what data. Sometimes called “server rate limiting,” this type is measured with “transactions per second,” or TPS.
  • Application rate limiting, which often associates traffic requests with an API key or IP address and throttles the traffic if it exceeds the limit in a timeframe. Sometimes called user rate limiting.
  • Geographic rate limiting, which increases security in certain locations by decreasing a rate limit if traffic is most likely to be very low at a certain time of day.
  • Global rate limiting, which controls actions across an entire application instead of a singular API.

Impact on Today

Rate limiting can prioritize the interactions from users that are important to the business.
For example, if an organization has several APIs that are for more critical actions, such as a payment service, rate limiting can prioritize requests to those APIs as high priority. Other requests are deprioritized so that in the event that something goes wrong, the API will drop the lower priority requests in an act called “load-shedding.”

Rate limiting also ensures responsiveness; if someone makes a request at noon, and then again at dinnertime, the response times should be the same. This consistency improves user experience and a better quality of service for anyone using the API service at any time of day. When an API is scaled to handle a larger number of users, rate limiting creates resilience so that the API does not collapse in on itself.

Additionally, rate limiting is important for availability and resiliency, as an application must still remain available for use if any of the components fail. If an application and its APIs are withstanding traffic and handling requests, rate limits can reduce the traffic that might bring an application offline. Even if requests load slower, the API is still available to use.

Related Terms

Learn More