Join us on July 18th for a webinar on CI/CD Pipeline Optimization with APIs & K8s. Register now
Back to blog
RATE LIMITING

How to Implement Effective Rate Limiting in Application Design

Jake Beck
May 17, 2024 | 12 min read
High Availability & Scalability with a K8s API Gateway

Modern web applications rely on Application Programming Interfaces (APIs) to handle client requests and transfer data between entities. API rate limiting is used to protect web resources and services by preventing the frequency of transactions from exceeding a set maximum number.

In this article, you’ll learn:

  • What rate limiting is,
  • The benefits it offers in application design,
  • Considerations in designing an efficient rate limiting service, and
  • Various algorithms and options to apply rate limiting in modern applications.

What is Rate Limiting in Application Design?

Users interact with application resources by sending requests. An instance where an endpoint allows unlimited requests may lead to the exploitation of the API resource and malicious bot attacks. An unrestricted traffic exchange can also overload the API resource with requests from a single user while preventing other legitimate entities from accessing the resource.

Rate limiting in application design helps define access rules and policies to limit the number of requests an entity, such as a device, IP addresses, or individual users, can perform within a given amount of time. The design concept subsequently ensures application security, stability, and sustainable scalability. A web server’s owner/administrator typically enforces the rate limit by stating the number of service requests or the amount of data exchanged per request. Once the limit is reached, the system blocks traffic exchange, blocking the user from making further requests.

What is Rate Limiting?

How Rate Limiting Protects and Optimizes Your Application

While rate limiting is a conventional strategy to limit network traffic, it is vital for improving large-scale web systems' security, performance, and quality. Some reasons to implement rate limiting for modern web applications include:

  • The prevention of resource starvation: All web applications run on finite resources. Rate limiting limits the number of API calls by an individual to help protect those resources from being constrained and help ensure your services availability.
  • Security: A hacker can perform common attacks such as a network Distributed Denial of Service attacks or Brute Force attacks by submitting a large number of service requests to the server. These requests consume system storage, memory, CPU, and storage capacity, making the application unavailable/inaccessible to legitimate users. Rate limits prevent hackers and even good bots from abusing web servers by limiting the number of times they can repeat actions, thus protecting legitimate users.
  • Data flow control: In a scalable web system, APIs process and transmit enormous data volumes. Rate limits can control data flow by merging multiple data streams into a single, manageable service. Also, Administrators can implement rate limits to distribute requests evenly between services, preventing the exhaustion of one server while others remain idle. By ensuring each server has an equal number of transactions to process, rate limits ensure optimized data processing.
  • Cost optimization: Web systems rely on resources that generate costs when accessed. API rate limiting helps maximize cost efficiency by preventing budget spikes arising from misconfigured resources and experimental deployments that result in surprise bills.
  • Policy management: Administrators can use rate limits to provide fair and reasonable resource usage for services shared among many clients. Web owners and administrators define rate and allocation limits as quotas, which can be applied to monetization packages or role-based access controls.

Key Features of a Robust Rate Limiting System for Web Applications

Designing a rate limiter requires defining various system features categorized under two system requirements: functional and non-functional requirements. Find a detailed explanation of these two system requirements below.

Functional Requirements: These requirements define how the rate limiter should work. The two main functional requirements for the rate limiter are:

  • Request rates - This defines the threshold for the number of requests allowed per client, crucial for those looking to implement rate limiting effectively. A sample rate limit could be 100 requests per second.
  • Error message - The alert that the client gets once they have reached the maximum number of API calls allowed.

Non-functional Requirements: These features define how the system should behave. They include:

  • System availability - An effective rate limiter should ensure the service is highly available by responding to all legitimate client requests.
  • System performance - The rate limiting service should not introduce latencies or unnecessary errors when clients request resource access.

Key Factors to Consider When Designing a Rate Limiter for Applications

Rate limiters help improve service quality for load-based applications by eliminating resource starvation and enforcing flow control. By preventing the application from being flooded with numerous service requests, a rate limiting system ensures that the API and application server perform optimally. This section discusses various aspects considered when designing an application’s rate limiter.

Choosing the Right Rate Limit Strategy

Multiple parameters can be used to implement rate limits. These parameters form the basis of rate limiting strategies. Some common types of rate limiting strategies include:

  • User-based rate limiting: This rate limiting technique involves administrators imposing a threshold on the number of transactions a user can initiate within a specific period, one of the several types of rate limits employed to protect your website or application.
  • Concurrency: This strategy limits the number of transactions that can be carried out simultaneously. Typically defined at the tenant or service level to prevent service locking due to multiple transactions accessing the same resource simultaneously.
  • Server-level rate limits: A server-level rate limiting ensures effective load-balancing between shared backend servers by limiting the number of requests from particular IP addresses to a server. When the amount of service requests from a user exceeds the maximum number for a server, the requests are forwarded automatically to the next server. This helps reduce DoS attacks on a service.
  • Location/ID: The location-based rate limiting strategy effectively utilizes geographically distributed systems by ensuring distant servers also handle requests. While using the server closest to the requesting user reduces network latency, it often leads to uneven server usage. Implementing geographic, time-based rate limits ensures even handling of requests by distributed systems.

Rate Limiting Algorithms

Rate limiting algorithms check the user session cache to determine whether particular IP addresses are to be restricted from making further requests. If the client has reached the maximum number of transactions for a particular time frame, the server can respond with the status code for too many requests (HTTP code 429). A common type of rate limiting algorithms include:

  • Token bucket: In the token bucket algorithm, a user is assigned session tokens, and administrators define the number of requests a user can make within a certain duration. When a user makes a request, a token is taken from the token group (bucket) for further processing. Once the user has exhausted the tokens in the bucket, the request is rejected.
  • Leaky bucket: The user requests are put in a First In First Out queue, with the queue being a bucket holding a fixed number of requests. A new request is added at the end of the queue. If the queue is full, newer requests are rejected.
  • Fixed window counter: This algorithm uses fixed time periods (window) to track incoming requests. Web administrators set the maximum number of requests for each window. Each incoming service request increases the counter for the duration. Once the counter exceeds the set limit, the subsequent requests are rejected.
  • Sliding log: Each consumer request is assigned a timestamped log. These logs are stored in a hashed database sorted by time duration. The request rate is determined by calculating the sum of logs after each new request is made. All other incoming requests are rejected if this rate exceeds the set threshold.
  • Sliding window: This strategy combines the fixed window and sliding log algorithm. The algorithm tracks a counter for each fixed window and adds a weighted value of the previous window’s requests to a current request’s timestamp. This minimizes the number of data points needed to track per entity, making it suitable for large-scale deployments.

Other Rate Limiting Options

  • Throttling: Apply rate limiting through throttling, which allows administrators to control the type and amount of data clients can access through the API, ensuring that rate limiting work effectively across the board.
  • Data sharding and caching: The database server level caches requests from recent active sessions. The application then checks cache storage for a counter value before fetching the data from the service. If the counter shows that the user has reached their maximum number of requests, the rate limiter can either reject the request or read the data without the backend update (reading from the shard partition).

API Rate Limiting with Edge Stack API Gateway

Apart from the conventional approaches to applying rate limiting, Edge Stack API Gateway ships with an inbuilt Rate Limit Service (RLS) that enforces a decentralized configuration model for the independent management of rate limits by multiple teams. The advanced rate limit service also allows global rate limits that can be enforced for every request going through the Edge Stack.

Edge Stack API Gateway minimizes the manual overhead of implementing an organization’s rate limiting service from scratch and is composed of two parts:

  • Request Labels — A basic metadata used by the service to determine limits it applies to requests.
  • RateLimitService — A gRPC service that instructs the Edge Stack on the services to be utilized for rate limiting

With request labels, administrators have enhanced control over traffic shedding as they can prioritize specific request types over others. The service allows groups of labels to be assigned to domains for separate namespaces. This allows for the independent management and control of the rate limiting service by assigning individual domains to respective teams.

Edge Stack API Gateway also natively supports the integration with popular service meshes for service discovery and edge routing:

Edge Stack API Gateway

Get robust rate limiting with Edge Stack API Gateway. Enhance your app's efficiency