Understanding Multi-Cluster Kubernetes
Organizations are increasingly deploying multiple Kubernetes clusters to improve availability, isolation and scalability. Learn about the benefits of building multi-cluster Kubernetes applications, how to architect them, and the strategies available for implementing them.
What is Multi-Cluster?
Multi-cluster is a strategy for deploying an application on or across multiple Kubernetes clusters with the goal of improving availability, isolation, and scalability. Multi-cluster can be important to ensure compliance with different and conflicting regulations, as individual clusters can be adapted to comply with geographic- or certification-specific regulations. The speed and safety of software delivery can also be increased, with individual development teams deploying applications to isolated clusters and selectively exposing which services are available for testing and release.
Today, organizations are increasingly deploying many more Kubernetes clusters, and treating these clusters as disposable. Several organizations have talked at KubeCon about “treating clusters as cattle, not pets.” This approach results in several benefits.
Improved Operational Readiness
By standardizing cluster creation, the associated operational runbooks, troubleshooting, and tools are simplified. This eliminates common sources of operational error while also reducing the cognitive load for support engineers and SREs, which ultimately leads to improved overall response time to issues.
Isolation and Multi-Tenancy
Strong isolation guarantees simplify key operational processes such as cluster and application upgrades. Moreover, isolation can reduce the blast radius of a cluster outage. Organizations with strong tenancy isolation requirements can route each tenant to their individual cluster.
Increased Availability and Performance
Multi-cluster enables applications to be deployed in or across multiple availability zones and regions, improving application availability and regional performance for global applications.
Cloud applications today have to comply with a myriad of regulations and policies. A single cluster is unlikely to be able to comply with every regulation. A multi-cluster strategy reduces the scope of compliance for each individual cluster.
Eliminate Vendor Lock-In
A multi-cluster strategy enables your organization to shift workloads between different Kubernetes vendors to take advantage of new capabilities and pricing offered by different vendors.
Multi-Cluster Application Architecture
Multi-cluster applications can be architected in two fundamental ways:
In this model, each cluster runs a full copy of the application. This simple but powerful approach enables an application to scale globally, as the application can be replicated into multiple availability zones or data centers and user traffic routed to the closest or most appropriate cluster. Coupled with a health-aware global load balancer, this architecture also enables failover; if one cluster stops functioning or becomes unresponsive, user traffic is routed to another cluster.
In this model, the services of a single application or system are divided across multiple clusters. This approach provides stronger isolation between parts of the application at the expense of greater complexity. This pattern is often used to ease compliance with regulatory requirements. For example, PCI DSS compliant services and supporting infrastructure can be localised into a single cluster, and the remaining application clusters can be operated outside of this scope. This pattern also facilitates speed and safety during application development and delivery, as individual development teams can deploy their specific services into their own cluster without impacting other teams.
Configuring Multi-Cluster Kubernetes
Multi-cluster Kubernetes has a broad scope, with a multitude of challenges and approaches. The general approaches can be loosely grouped into two categories:
"Kubernetes-centric" approaches have worked on supporting and extending the core Kubernetes primitives for multi-cluster use cases to enable a centralized management plane for multiple clusters. The Kubernetes Cluster Federation project, managed by the Kubernetes Multicluster Special Interest Group takes this approach, as does Google’s Anthos project (via environs).
Real-World Multi-Cluster Kubernetes
Which multi-cluster strategy should you choose? As of mid-2020, most organizations adopting multi-cluster are evaluating the network-centric approaches. The primary reasons for this trend are the lack of maturity in the Federation project and the fact that a GitOps approach to configuration management has become de rigueur for Kubernetes users. A GitOps approach, coupled with some basic automation, lends itself easily to managing multiple clusters, as each cluster can be created from a standardized configuration. Thus, a centralized management plane does not reduce management overhead in a way that is proportional to the complexity it introduces.
Of the network-centric approaches, all of the approaches listed above require adoption of a service mesh. Thus, deciding between the approach requires evaluating service meshes in general, in addition to the specific capabilities of each mesh. A brief summary of each of the approaches is below:
Istio has two different strategies for multi-cluster support: replicated control plane and shared control plane. In general, a replicated control plane results in greater system availability and resilience. Istio provides powerful primitives for multi-cluster communication at the expense of complexity. In practice, application and deployment workflow changes are needed to take full advantage of Istio multi-cluster.
Linkerd service mirroring is a simple but powerful approach that requires no modification by applications. Moreover, Linkerd supports using Ambassador Edge Stack for connecting traffic between clusters, enabling resilient application-level connectivity over the Internet.
Consul Connect uses a VPN-like approach built around Consul Mesh Gateways to connect disparate clusters. This approach requires configuring Consul for data center federation so that different Consul instances can achieve strong consistency over a WAN.