Register for our API Month Tech Talk on April 18! Explore the latest in API trends, security, and more. Register Now
Back to blog

Ensuring your API Gateway against Risk of Single Point of Failure (SPOF)

Discover how Edge Stack API Gateway and these redundancy strategies will help you prevent the risk of a single point of failure (SPOF).

Cindy Mullins
March 8, 2024 | 13 min read

Today, many developers recognize the need for an API Gateway to manage traffic as a single entry point for clients to access their backend services. The benefits of using an API Gateway are clear: it provides a secure and consistent entry point for requests from external clients, authenticates incoming requests and authorizes access to services by user credentials, enforces rate limiting, and improves scalability and efficiency, to name a few. Another advantage is that, by design, using containers and orchestration technologies protects against traditional hardware failures where your API code is subject to physical or virtual machines.

But, as the gatekeeper between external client requests and your backend services, there’s also a potential risk posed by an API gateway as a single point of failure for your production system. If the Gateway is down, users lose access to the data and services you’re providing, which negatively impacts user satisfaction and business continuity and can even affect profitability for monetized services.

Fortunately, there are several safeguards you can put in place to prevent your API Gateway from becoming a point of failure in your microservices architecture. The first step is to choose a proven and production-ready API Gateway like Edge Stack. Next, it’s important to implement measures to ensure redundancy and provide failover mechanisms and monitoring to ensure high availability.

In this article, we’ll first look at some built-in characteristics of Edge Stack that provide availability at the API Gateway level. Then, we’ll examine some redundancy strategies that could also include your cloud provider or on-prem infrastructure to avoid potential SPOFs.

Why Edge Stack? Kubernetes-native, Built on Envoy

As a robust, production-ready API Gateway, Edge Stack is natively equipped to run in high availability mode, providing consistency for your application APIs and ensuring responsiveness. Edge Stack’s simplified approach to traffic management helps to ensure high availability and resilience. It’s designed to handle millions of incoming HTTP/HTTPS requests and can be configured to route high volumes of H2, gRPC, and Websockets traffic efficiently as well.

Edge stack is built on Envoy as a Kubernetes deployment, which means that if a node in the cluster fails, another will take over, continuing to route the request to the relevant backend APIs and consolidating the responses as needed. Combined with a scalable microservice architecture, a high-availability API gateway cluster ensures your application can handle large volumes of traffic and react to unexpected spikes while being resilient to hardware failures.

6 Key Advantages of Edge Stack to Prevent Single Points of Failure (SPOF) and Boost System Reliability

Kubernetes-native availability and efficiency

Edge Stack relies on Kubernetes for scaling, high availability, and persistence. An API gateway usually requires a data store to hold the configuration details and other persistent data. In the case of Edge Stack, all configuration is stored directly in Kubernetes; there is no database. This eliminates the need for things like replication and synchronization of the data store across new nodes and across multiple regions and provides, by design, efficiency at scale that other API Gateways do not.

Edge Stack is packaged as a single container that contains both the control plane and an Envoy Proxy instance. By default, Edge Stack API Gateway is deployed as a Kubernetes deployment and can be scaled and managed like any other Kubernetes deployment. Since Edge Stack’s scaling is based on Kubernetes, its scaling capabilities are available whether your applications are hosted on a cloud provider or on-prem.

Horizontal and Vertical Pod Scaling (HPA and VPA)

While timeouts and rate limiting assist with managing consistently high traffic volumes, you still need to ensure the availability of your applications in case of traffic spikes and pod failure. To provide that seamless consistency, Edge Stack employs Kubernetes-native automatic horizontal and vertical Kubernetes pod scaling and allows for manual pod scaling as well. Which one to choose depends on several factors and is mainly determined by the needs of your application.

Horizontal node-level autoscaling increases or decreases the number of nodes running in a cluster, while pod-level scaling modifies the number of pod replicas. It’s useful for handling changes in the overall cluster workload and is ideal for application-specific load changes.

Vertical pod autoscaling adjusts the CPU/memory resources allocated to the pods in order to right-size for the required capacity. It’s essential for applications with varying resource demands. It can be useful to optimize capacity by geographical region, for example, to increase capacity during business hours in high-volume locations.

As a general rule, horizontal scaling is more efficient in its responsiveness than vertical scaling, which, because it alters the specifications of the pod, requires all pods to restart. For more on horizontal scaling and vertical pod scaling, check out our recent tech talk covering these topics.

Built-In High-Performance Rate Limiting

Rate limiting is a powerful technique to improve the availability and resilience of your services. In Edge Stack, each request can have one or more labels. These labels are exposed to a third-party service via a gRPC API. The third-party service can then rate limit requests based on the request labels.

In Edge Stack, each engineer (or team) can be assigned its own domain (a separate namespace for labels). By creating individual domains, each team can assign its own labels to a given request and independently set the rate limits based on its own labels. Edge Stack requires a gRPC RateLimitService as defined in Envoy's v3/rls.proto interface. While you can implement your own rate limit service, Edge Stack provides an integrated high-performance rate limiting service.

Circuit Breakers and automatic retries

Circuit breakers are another useful way to improve resilience. By preventing additional connections or requests to an overloaded service, circuit breakers limit the "blast radius" of an overloaded service. By design, Edge Stack circuit breakers are distributed, i.e., different Edge Stack instances do not coordinate circuit breaker information.

Circuit breakers can be applied on an auth service, a Mapping to a route, or on the Module for global application. When your service is under heavy load and starting to time out, automatic retries can exacerbate the problem, increasing the total request volume by 2x or more. By aggressively circuit breaking, you can mitigate failure in this scenario.

Configurable Timeouts, Drain_time, fast_reconfigure

Edge Stack routes traffic based on Mappings, which are flexible and highly configurable Custom Resource Definitions. With Mappings, you can employ a variety of timeout settings to cover network latency and connectivity issues, as well as specs like drain_time, which control how long old configuration persists while new configurations are spinning up. Edge Stack is also designed for fast reconfiguration by default. While these settings don’t mitigate the risk of Single Point of Failure directly, using these specs effectively helps optimize the availability of your services within your allocated CPU and memory specifications.


Edge Stack’s observability features align with the Four Golden Signals on monitoring: latency (the time it takes to service requests), traffic (a measure of how much demand is being placed on your system), errors (the number of failing requests), and saturation (the degree to which total system capability is being utilized for request-response). Edge Stack provides these must-have metrics of your user-facing deployment and collects many other statistics internally, making it easy to output data to the monitoring tool of your choice.

The :8877/metrics endpoint can be polled for aggregated statistics for both Envoy metrics and Edge Stack control plane metrics collected in a Prometheus-compatible format. Edge Stack can also push Envoy statistics over the StatsD or DogStatsD protocol.

Edge Stack makes it easy to output Envoy-generated statistics to Prometheus. Prometheus is an open-source monitoring and alerting system. When used along with Grafana, you can create a dynamic dashboard to monitor ingress into your Kubernetes cluster.

Redundancy Strategies for High Availability in Your Infrastructure

Multiple Domains

A common redundancy strategy is to deploy multiple domains. This strategy specifically addresses the issue of certificate expiry or other problems that may cause downtime on a domain. A single Edge Stack instance can serve multiple domains so this strategy can be employed without rolling out multiple gateways.

Multiple Gateways

A common strategy is to deploy multiple API Gateways to do things like traffic splitting between them, regionalization, or assigning one gateway as the primary and another as a secondary. The downside is this does incur overhead and extra maintenance, and the Gateways also have to be upgraded in sync to newer versions.

When using multiple API Gateways, you can split the API calls using a load balancer and Elastic IP. An API Gateway also needs to be able to round-robin load balance your upstream targets. Edge Stack performs simple round-robin balancing via Kubernetes services and advanced load balancing with the configuration of the Kubernetes Endpoint Resolver or Consul Resolver..

Edge Stack also works with AWS, GKE, Azure, and the other major cloud provider’s Load Balancers and provides health check monitoring and performance logs for early alerts of capacity and networking issues.

Another option is to chain Edge Stack instances so that the first gateway controls authentication of external traffic entering the gateway, while a second internal Edge Stack instance focuses on routing to services. This can boost performance for high traffic volumes as each gateway has a unique focus within the flow of serving requests and responses. This, however is more of an efficiency strategy rather than one that addresses SPOF.


Whether you’re hosting servers on-prem or using a cloud provider, your physical servers will generally be located in the same geographic region as your users, to cover usage peaks during prime business hours. As part of a diversification strategy, it’s also possible to locate your Edge Stack deployments in multiple regions, either with your own hardware or via a cloud provider. This helps ensure uptime in the event of physical disruption or hardware failure. Naturally, this strategy would result in increased maintenance and cost.

Multi-region deployment helps reduce request latency perceived by geographically distributed API consumers and improves service availability if one region goes offline. When running on a cloud provider, the provider determines the degree of services available at each paid tier. For example, with Microsoft Azure, multi-region deployment is only available in the premium service tier, and only the gateway component of your API management instance is replicated in multiple regions. The instance's management plane and developer portal remain hosted only in the primary region, the region where you originally deployed the service.

While not an exhaustive list, these are just some of the features you can use to secure your Edge Stack instance against SPOF. It should also provide some food for thought on redundancy strategies you can implement to ensure business continuity and resilience. Contact Us or Join Our Slack Community to learn how Edge Stack can be a game changer for your organization.