Multi-Cluster Kubernetes Management With Operators

At Kubermatic, we have been helping our customers deliver Kubernetes clusters and other cloud native solutions since before they were buzzwords. We helped customers build clusters using Ansible, Terraform, and a variety of other non cloud native tools…and we helped them rebuild the clusters when we ran into the limits of these tools. In these early days, two things quickly became clear to us: 1) Kubernetes is not a single large cluster solution, but rather requires a larger number of smaller clusters 2) Kubernetes multi-cluster management needs cloud native tools built for a declarative, API driven world. Since then, these ideas have largely been validated by a variety of organizations around the world including the CNCF, Twitter, USA Today, Zalando, and Alibaba. Knowing that every company running Kubernetes at scale would need to effectively administer multi-cluster management, we created the open source Kubermatic Kubernetes Platform. This blog post will cover why you need multi-cluster management, how Kubermatic Kubernetes Platform leverages Kubernetes Operators to automate cluster life cycle management across multiple clusters, clouds, and regions and how you can get started with it today.

Why You Need Multi-Cluster Management

Kubernetes lacks hard multi-tenancy capabilities that give users, organizations, or operators the ability to allow untrusted tenants to share infrastructure resources or separate different pieces of software. This presents both a security and operational problem. When operators seek to separate workloads by type (sensitive vs nonsensitive data processing) or even just production vs non-production there is no way to do this on the cluster level; creating a security nightmare. On the operational side, trying to deploy too many applications into the same cluster can result in version conflicts, configuration conflicts, and problems with software lifecycle management. Finally, without proper isolation there is an increased risk of cascading failures. 

Without hard multi-tenancy within a cluster, separate clusters must be used to provide adequate separation for workloads with different security requirements. Having multiple clusters to deploy applications into also allows operators to deploy similar applications together while segregating those with different life cycles from each other. Applications deployed into the same cluster can be upgraded together to reduce the operational load while applications that require different versions, configurations, and dependencies can run in separate clusters and be upgraded on their own. 

If running multiple clusters is the only solution to meeting these workload and infrastructure requirements, the operational burden of this model must also be considered. Running a multitude of clusters is a massive operational challenge if done manually. For this reason, any operator considering running Kubernetes at scale should carefully evaluate their multi-cluster management strategy. At Kubermatic, we have chosen to do multi-cluster management with Kubernetes Operators. 

What Is a Kubernetes Operator?

An Operator is a piece of software that understands how to run and facilitates operating another piece of software. More technically, as CoreOS, who introduced the first Kubernetes Operator in 2016, notes: An Operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling. An Operator has its custom controller watching the custom resources specifically defined for the applications.

This allows developers to codify life cycle management knowledge for applications that need to maintain state and thereby automates much of the ongoing management including deployments, backups, upgrades, logging, and alerting by simply watching events and leveraging the reconciliation loops built into Kubernetes. In short. a well built operator covers the complete lifecycle of a containerized software.

How Do We Use Kubernetes Operators to Do Multi-Cluster Management?

With Kubermatic Kubernetes Platform, we extend the Operators paradigm beyond applications to manage the clusters themselves. Yes, we are using Kubernetes to operate Kubernetes. This model has actually been proven out by multiple organizations including Alibaba who uses it to manage tens of thousands of clusters. On a technical level, the cluster state is defined in Custom Resource Definitions then stored within etcd. A set of controllers and their associated reconciliation loops watch for changes or additions to the cluster state and update each as required. All state is stored in a “Master Cluster”. When a new user cluster is defined, the control plane (API, etcd, Scheduler, and Controllers) is created as a Deployment of containers within a namespace of the master cluster. The worker nodes of the user cluster are deployed by machine-controller which implements Cluster API to bring declarative creation, configuration, and management to worker nodes. Operators allow Kubermatic Kubernetes Platform to automate not only the creation of clusters, but also their full life cycle management. Updating the control plane is merely doing a rolling update of a deployment of containers while updating the actual nodes in the cluster can also be done declaratively in a roll fashion.

Cluster Architecture

Leveraging Kubernetes Operators also gives a consistent abstraction across all infrastructure providers. Rather than reinventing the wheel for each one, the same tooling can easily be ported from one provider to the next including hybrid and multi-cloud as well as integrating on-premise infrastructure (virtualized and bare metal).

Multi-Cluster Architecture

What Do Operators Allow Our Users to Do?

While creating an elegant solution to a difficult technical problem has been an exciting journey and learning experience, the most gratifying part has been seeing the impact it has on our users every day. As partners on their cloud native journey, we love to see the results of our software speak for themselves.

SysEleven, a managed hosting provider out of Berlin, was our first production user. They wanted to be able to provide Kubernetes-as-a-Service to their customers, but knew they couldn’t scale the operations through people. They chose Kubermatic Kubernetes Platform to scale through software instead and have had it in production for almost 3 years. Because the Kubernetes Operators behind Kubermatic Kubernetes Platform automate many of their operational tasks including the classic “turn it off and turn it back on again”, they are able to run and manage hundreds of clusters with just one FTE. This has allowed their Kubernetes team to focus on customer demands and deliver the high quality service they have become known for. You can read about their whole journey with us here.

Above and beyond the cloud-native journey, operators also allow us to adopt our proven principles and processes to adapt on to the edge. In the near future we will provide edge capabilities that endow the principles we just covered. 

How to Get Started

Last year, we open sourced Kubermatic Kubernetes Platform to help as many companies as possible accelerate their cloud native journey. You can find the code on Github, the documentation on our website, and our community on Slack. We are excited to see you automate your multi cluster management with Kubernetes Operators!

Sascha Haase

Sascha Haase

VP Edge