Join us on June 5th for a Tech Talk with Bill Doerrfeld and Kenn Hussey as we discuss the future of open source. Register now
Back to blog

Part 3: Implementing a Java Rate Limiting Service for Edge Stack API Gateway

May 17, 2018 | 14 min read

The rate limiting functionality offered by the Kubernetes-native Edge Stack API Gateway is fully customizable, allowing any service that implements a gRPC endpoint to decide whether a request should be limited or not. In this article, which builds on the previous and , you will learn how to build and deploy a simple Java-based rate limiting service for Edge Stack how rate limiting works.

Getting Setup: The Docker Java Shop

In my previous tutorial, “Deploying Java Apps with Kubernetes and the Edge Stack API Gateway,” I added the open source Edge Stack API gateway to an existing series of Java (Dropwizard and Spring Boot) based services that were deployed into Kubernetes. If you haven’t seen this, I would recommend going through this tutorial and the others in the series to familiarize yourself with the fundamentals. The rest of this article assumes you’re comfortable building Java-based microservices and deploying them to Kubernetes, and you also have all of the prerequisites installed (I’m using Docker for Mac Edge, with built-in Kubernetes support, but the principles should be similar if you are using minikube or a remote cluster).


You will need to have these installed locally:

  • Docker for Desktop — I am using the edge community edition (18.04.0-ce), with in-built support for a local Kubernetes cluster — I have also increased the memory available to Docker to 8Gb, as the Java services can be a little memory-hungry at times
  • Editor of choice, Atom or VS code, or IntelliJ for the Java code

You can grab the latest version of the “Docker Java Shop” source code here:

You can clone the repo via SSH like so:

$ git clone

The initial version of the service architecture and deployment looked as follows:

You can see from the diagram that the Docker Java Shopping application consists of primarily three simple services, and in the previous tutorial, you added the Edge Stack API Gateway as the “front door” of the system. It is worth noting that the Edge Stack API Gateway will be running on port 80, the standard unauthenticated web port, and so you will need to make sure there is nothing else locally running on the same port.

Rate Limiting 101 with the Edge Stack API Gateway

I have added a new folder, “kubernetes-ambassador-ratelimit” to the repo containing the Kubernetes config for this tutorial. so go ahead and navigate to this directory via the command line. Listing that directory will show the following files:

(master *) oreilly-docker-java-shopping $ cd kubernetes-ambassador-ratelimit/
(master *) kubernetes-ambassador-ratelimit $ ll
total 48
0 drwxr-xr-x 8 danielbryant staff 256 23 Apr 09:27 .
0 drwxr-xr-x 19 danielbryant staff 608 23 Apr 09:27 ..
8 -rw-r — r — 1 danielbryant staff 2033 23 Apr 09:27 ambassador-no-rbac.yaml
8 -rw-r — r — 1 danielbryant staff 698 23 Apr 10:30 ambassador-rate-limiter.yaml
8 -rw-r — r — 1 danielbryant staff 476 23 Apr 10:30 ambassador-service.yaml
8 -rw-r — r — 1 danielbryant staff 711 23 Apr 09:27 productcatalogue-service.yaml
8 -rw-r — r — 1 danielbryant staff 659 23 Apr 10:02 shopfront-service.yaml
8 -rw-r — r — 1 danielbryant staff 678 23 Apr 09:27 stockmanager-service.yaml

You can apply these Kubernetes config files with this command:

$ kubectl apply -f .

Doing so will deploy the following service architecture, with the primary difference from the previous architecture being the addition of the “ratelimiter” service. This service is written in Java, without a web/microservices framework, and it exposes a gRPC endpoint that Ambassador can use for rate limiting. This allows for customization and flexibility regarding the rate limiting algorithm you can implement (for more details on the benefits of this, check out my earlier article).

Exploring the Rate Limiter Kubernetes Service

The ratelimiter service is deployed into Kubernetes just like any other service, and could be horizontally scaled as appropriate. Here are the contents of ambassador-rate-limiter.yaml Kubernetes config file:

apiVersion: v1
kind: Service
name: ratelimiter
annotations: |
apiVersion: ambassador/v0
kind: RateLimitService
name: ratelimiter_svc
service: "ratelimiter:50051"
app: ratelimiter
type: ClusterIP
app: ratelimiter
- protocol: TCP
port: 50051
name: http
apiVersion: v1
kind: ReplicationController
name: ratelimiter
replicas: 1
app: ratelimiter
- name: ratelimiter
image: danielbryantuk/ratelimiter:0.3
- containerPort: 50051

You will explore the contents of the underlying “danielbryantuk/ratelimiter:0.3” Docker image later in the article, but for now all you need to know is that this service is running within the cluster, and exposes port 50051.

In the ambassador-service.yaml config file, I have also updated the Edge Stack Kubernetes annotations config to ensure that requests to the shopfront service are rate limited simply by including the “rate_limits” property. I have also added some additional metadata “- descriptor: Example descriptor”, which I will explain in more detail in the next article. For now, I’ll say that this is a good way to pass additional metadata into the rate limiting service.

apiVersion: v1
kind: Service
service: ambassador
name: ambassador
annotations: |
apiVersion: ambassador/v0
kind: Mapping
name: shopfront_stable
prefix: /shopfront/
service: shopfront:8010
- descriptor: Example descriptor

Check that the deployment has succeeded using kubectl:

(master *) kubernetes-ambassador-ratelimit $ kubectl get svc
ambassador LoadBalancer localhost 80:30051/TCP 1d
ambassador-admin NodePort <none> 8877:30637/TCP 1d
kubernetes ClusterIP <none> 443/TCP 16d
productcatalogue ClusterIP <none> 8020/TCP 1d
ratelimiter ClusterIP <none> 50051/TCP 1d
shopfront ClusterIP <none> 8010/TCP 1d
stockmanager ClusterIP <none> 8030/TCP 1d

All six of our services look good to go (plus the Kubernetes service) — that’s three Java services, two Ambassador services, and the rate limiter service.

You can test the deployment by making a curl to the shopfront endpoint, which (as shown above) should be running on the EXTERNAL-IP of localhost on port 80:

(master *) kubernetes-ambassador-ratelimit $ curl localhost/shopfront/
<!DOCTYPE html>
<html lang="en" xmlns="">
<meta charset="utf-8" />
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src=""></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/bootstrap.min.js"></script>
</html>(master *) kubernetes-ambassador-ratelimit $

You will notice that this produces a lot of HTML, which is simply the frontpage of the Docker Java Shop, and can be more easily viewed within a browser pointed at http://localhost/shopfront/. However, it will be easier to use curl for our rate limiting experiments.

Testing the Rate Limiting

For this demonstration rate limiting service, I have decided to rate limit simply against the service itself (i.e. when the rate limit service calculates whether or not to limit a request, the only metrics I will be considering is the number of requests made against a specific backend service within a time period). The rate limiting algorithm implemented within the code uses the token-bucket algorithm with a maximum bucket size of 20, and a refill rate of 10 tokens per second.

Because the rate limiting is currently associated with any request, you can make 10 requests against the API per second without any issues. You can also burst above this temporarily because the bucket initially contains 20 tokens. However, as soon as the initial “burst” tokens have been used and you attempt to make more than 10 requests per second, then you will receive an HTTP 429 “Too Many Requests” status code. At this point the Edge Stack API gateway is not forwarding the requests to the backend service.

Let’s see if you can simulate this by issuing many requests via curl. You’ll want to suppress the HTML payload being displayed ( — output /dev/null) and also the curl request ( — silent), but you still want to see the non-OK HTTP response status codes ( — show-error — fail). You can put all of these curl options together with a simple bash loop and date output (to show what time you are making requests) to create a very crude load generator (and get ready to CTRL-C to terminate the loop!):

$ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/; echo -e $(date);done
(master *) kubernetes-ambassador-ratelimit $ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/; echo -e $(date);done
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:31 BST
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST
curl: (22) The requested URL returned error: 429 Too Many Requests
Tue 24 Apr 2018 14:16:35 BST

As you can see, the first several requests are served fine, as evidenced by the date the request was displayed alongside no errors, and quickly (at least on my Mac), the loop exceeds 10 requests per second, and I start receiving 429 HTTP response code errors.

As an aside, I would normally use the Apache Benchmarking “ab” load generating tool for this simple experiment, but ab might have an issue with calling localhost (or the Docker config was presenting some problems).

Examine the Rate Limiting Service

The code for the Ambassador Java rate limiting service can be found in the repo ambassador-java-rate-limiter on my GitHub account. In this repo you will find the code and the Dockerfile I have used to build the container image that I pushed to DockerHub. Using this Dockerfile as a template, you can modify the code and then build and push your own image to DockerHub. You can then modify the ambassador-rate-limiter.yaml file in the main Docker Java Shopping repo to use your service for rate limiting.

Exploring the Java Code

If you now dive into the actual Java code, the main class of interest is RateLimiterServer, which implements the rate limiting gRPC interface defined by the Envoy proxy that is used within the Ambassador API. I’ve created a local copy of the ratelimit.proto interface that is used by the gRPC Java build tooling defined in the Maven pom.xml. There are three primary points of interest in the code: implementing the gRPC interface, running the gRPC server, and implementing the actual rate limiting code. Let’s now look at these in turn.

Implementing the Rate Limiting gRPC Interface

If you look into the inner class within RateLimitServer, named “RateLimiterImpl”, which extends RateLimitServiceGrpc.RateLimitServiceImplBase, you can see that I have overridden a method from this abstract class:

public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver)

A lot of the naming conventions used here come from the Java gRPC libraries, and for more information, you can consult the gRPC Java documentation. Having said this, you can clearly see the root of many of the names if you look into the ratelimit.proto file that defines the expected rate limiting interface by the Envoy proxy used behind the scenes of Ambassador. For example, you can see that the core service defined in this file is named RateLimitService (line 9), and there is a single RPC method defined within the service “rpc ShouldRateLimit (RateLimitRequest) returns (RateLimitResponse) {}” (line 11) which is implemented in Java through the method signature shown above for “shouldRateLimit”.

If you are interested, a lot of the Java gRPC code generation magic is conducted by the “protobuf-maven-plugin” (line 99 of the pom.xml).

Running the gRPC server

Once you have implemented the gRPC interface defined with ratelimit.proto, the next thing to do is to create a gRPC server that can listen and reply to requests made to it. If you look into the content of the RateLimitServer, you can follow the chain of processing from the main method. In a nutshell, the main method creates an instance of the RateLimitServer class, calls the start() method, and then calls the blockUntilShutdown() method. This starts an instance of the class, exposes the gRPC interface on the defined port, and listens for requests.

Implementing Java Rate Limiting Code

The actual Java code responsible for the rate limiting process is contained within the shouldRateLimit() (line 75) method of the RateLimiterImpl inner class. Rather than implementing my own rate limiting algorithm, I’m using the popular bucket4j Java rate limiting library that is based on the token-bucket algorithm. As I limit the number of requests made to each service, each bucket will be identified (or keyed) with the service name. Every request to each service will remove a token from the associated bucket. In this example, I am not storing the buckets in an external database and instead have opted to use an in-memory ConcurrentHashMap.

If I were implementing this service for a production use case, I would typically use an external persistence store to enable horizontal scalability, probably something like Redis. However, for now, you will have to bear in mind that if you horizontally scale the rate limit service without changing each service’s bucket limits, then you will be increasing the number of allowable (non-rate limited) requests directly in relation to the increased number of services.

An excerpt of the RateLimiterImpl code that creates the bucket4j bucket can be seen below:

private Bucket createNewBucket() {
long overdraft = 20;
Refill refill = Refill.smooth(10, Duration.ofSeconds(1));
Bandwidth limit = Bandwidth.classic(overdraft, refill);
return Bucket4j.builder().addLimit(limit).build();

The shouldRateLimit method code can be seen below, and this simply attempts to tryConsume(1) — try and consume one token from the bucket — before returning an appropriate response code.

public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver) {
String destServiceName = extractDestServiceNameFrom(rateLimitRequest);
Bucket bucket = getServiceBucketFor(destServiceName);
Ratelimit.RateLimitResponse.Code code;
if (bucket.tryConsume(1)) {
code = Ratelimit.RateLimitResponse.Code.OK;
} else {
code = Ratelimit.RateLimitResponse.Code.OVER_LIMIT;
Ratelimit.RateLimitResponse rateLimitResponse = generateRateLimitResponse(code);


The code should be relatively easy to understand, and the primary responsibility of this method is to return either Ratelimit.RateLimitResponse.Code.OK, if no rate limiting is required on the current request or Ratelimit.RateLimitResponse.Code.OVER_LIMIT if this request should be denied due to rate limiting. Depending on this response by this gRPC service, the Ambassador API gateway will either pass the request through to the backend service or short-circuit this trip and simply return a 429 “Too Many Requests” HTTP status code without calling the backend service.

This simple example protects against one service becoming overwhelmed, but hopefully, this also demonstrates the core rate limiting concepts and could be relatively easily adapter to rate limit based on request metadata, such as user ID or something similar.

Until the Next Time…

This article has demonstrated how you can create a rate limiting service in Java that can easily be integrated into the Ambassador Labs API gateway and fully customized with any rate limiting logic you require. In the next and final article of the series you will explore the Envoy rate limiting API in more depth, to learn more about designing a rate limiting service.

Additional Rate Limiting Articles: