Friday, May 3, 2019

Kubernetes Operators - objects, resources, and controllers (a comic)

Kubernetes Operators - objects, resources, and controllers

Kubernetes - my interpretation

I’ve heard Kubernetes described in a couple of different ways. “A container orchestrator” or “a tool for managing distributed applications” are pretty common descriptions. But I’ve come to view it as an extensible collection of reconciliation engines. It merely brings the state of the world to the desired state via controllers (we’ll get to that in a minute). These controllers can be of anything; they don’t have to control container-related objects.

Our team recently had a discussion about Kubernetes operators and I wanted to explain (graphically) how they work and define some of the shared language surrounding them.

Interacting with kubernetes

Requests to kubernetes go through the API Server . As a user, we generally talk to the API server via commandline (kubectl) and a kubernetes manifest (yaml or json). The manifest is a request for Kubernetes Objects (or more precisely the desired state of Kubernetes Objects).
It’s the job of the API server to:
  1. validate the request (match schema)
  2. put the request (the desired state) in etcd
Kubernetes documentation calls this entry in etcd a “record of intent”

There are a bunch of native Kubernetes objects available out of the box. These are all centered around containers and things you would like to do with containers. Examples are replicaSets, pods, secrets, config maps, service accounts… the list goes on. But sometimes as a service author, you want people to be able to request your custom kind of object through Kubernetes…

Kubernetes has no idea what fries are (that is, it doesn’t match known schema) so your request is rejected. Enter CRDs…

CRD is an acronym for Custom Resource Definition . So far we’ve talked about Kubernetes Objects which often gets confused with Kubernetes Resources. A representation of a specific kind (like replicaSet) is an object. The API endpoint to obtain that object is called a resource . As an example, a specific v1 replicaSet object can be obtained from the /api/v1/namespaces/<namespace-name>/replicaSets/<replicaSet-name> resource.

When we request a CRD, the API server will create a new RESTful resource path for the custom kind specified.

And now we can try our order again

We can even do specific schema validation with this CRD:

This is all fine and dandy, but so far all we’ve done is persisted a record in the etcd database. This doesn’t actually create anything. A more complete picture actually looks like this:

Fries are now on the menu, but no one there knows how to make them. Enter controllers . A controller is a piece of software (a control loop) that watches for orders (the desired state ). It then makes the order (tries to bring the current state to the desired state ).

We need a FryController. We need Alan!

When I first started learning about Kubernetes, I wondered why they used etcd as their key value store. I’ve used several other popular key value stores. Surely they are just as stable and scaleable as etcd. So why etcd?

It turns out etcd has a feature that is very important for Kubernetes. There is a watch API. It would be very expensive for multiple controllers to constantly hammer the etcd database with requests for records just to determine if state has changed for the kind of object they manage. The watch API allows a controller to ask etcd to notify them when changes happen to a key rather than the controller having to constantly make db requests.

Now, whenever a Fry object is created in the McDonalds namespace, our trusty FryController - Alan - will work to make it exist to the specs requested.

As you can see, to support custom objects in Kubernetes, we need both CRDs and controllers. When the CRD/controller is application-specific (e.g. MySQL), we call this the operator pattern . Sometimes we aren’t so precise though. For instance, we talk about “installing the operator” when we are installing just the controller piece.