How Helm Uses ConfigMaps to Store Data

Mar 23 2017

Helm, the package manager for Kubernetes, uses first-class Kubernetes objects to store its data. Here's how we use ConfigMaps to track Helm releases.

Helm follows the formula "Chart + Values = Release". You start with a Helm chart (a software package), you add your own configuration values, and you install it into your cluster. That makes a release.

On the command line, we do this with the following command:

$ helm install -f config.yaml stable/wordpress
NAME:   amber-gopher
LAST DEPLOYED: Thu Mar 23 15:57:40 2017
NAMESPACE: default

What you get back is a release. In the case above, our release is named amber-gopher. Each time I upgrade this release, I get a new release version.

$ helm upgrade --set "foo=bar"  amber-gopher stable/wordpress
Release "amber-gopher" has been upgraded. Happy Helming!
LAST DEPLOYED: Thu Mar 23 15:59:14 2017
NAMESPACE: default

An upgrade takes an existing release and upgrades it with the given values (--set foo=bar) and chart (again, stable/wordpress).

Now I have two revisions of the same release:

$ helm history amber-gopher
REVISION    UPDATED                     STATUS      CHART           DESCRIPTION
1           Thu Mar 23 15:57:40 2017    SUPERSEDED  wordpress-0.4.3 Install complete
2           Thu Mar 23 15:59:14 2017    DEPLOYED    wordpress-0.4.3 Upgrade complete

How does Helm track those releases?

The Release ConfigMaps

Helm's in-cluster component is called Tiller. By default, it installs into the Kubernetes system namespace (kube-system). It has the following jobs:

  • Answer requests from Helm clients
  • Expand and render charts into a set of Kubernetes resources
  • Manage releases

That last part requires that Tiller maintain a list of all of the releases. When we run helm list, Tiller shows us all of the releases. And we can use helm history to see all of the revisions for a given release.

Tiller stores all of this information in Kubernetes ConfigMap objects. And those objects are located in the same namespace as Tiller. We can easily get a list of them:

$ kubectl get configmap -n kube-system -l "OWNER=TILLER"
NAME                     DATA      AGE
amber-gopher.v1          1         7m
amber-gopher.v2          1         6m
foolhardy-alligator.v1   1         2d
voting-otter.v1          1         2d

We can see above that there are three releases, and one of the releases (amber-gopher) has two revisions.

A Peek at a Release Revision

So let's take a quick look at one of these ConfigMaps with kubectl get configmap -n kube-system -o yaml amber-gopher.v2:

apiVersion: v1
  release: H4sIAAAAAAAC/+x9TYwcS5qQ5tmen5x3... # REALLY LONG STRING REMOVED
kind: ConfigMap
  creationTimestamp: 2017-03-23T21:59:15Z
    CREATED_AT: "1490306355"
    NAME: amber-gopher
    VERSION: "2"
  name: amber-gopher.v2
  namespace: kube-system
  resourceVersion: "86277"
  selfLink: /api/v1/namespaces/kube-system/configmaps/amber-gopher.v2
  uid: f4dacc7d-1013-11e7-a017-be7592efdc06

As you can see, this is a pretty basic ConfigMap. Tiller uses a robust set of labels to mark up and track revision history. For example, you can grab a particular revision using a label selector: kubectl get configmap -n kube-system -l "NAME=amber-gopher,VERSION=2"

But right in the middle of the ConfigMap is a giant base-64 encoded blob stored under data.release. In the example above, I have redacted it out.

This blob is a base-64 encoded, gzipped archive of the entire release record, which includes the original chart, the values, and some useful state-tracking information. The release record is in a binary protobuf format, which is of little use to most users.

You can see a human-friendly version of this data with the helm get command:

$ helm get amber-gopher
RELEASED: Thu Mar 23 15:59:14 2017
CHART: wordpress-0.4.3
foo: bar

foo: bar
image: bitnami/wordpress:4.7.3-r0
imagePullPolicy: IfNotPresent
# ....

With access to the ConfigMaps, you can perform a number of operations on your Tiller setup, like querying how many release revisions have been created across the cluster. However, we strongly advise that you do not modify a release record, as it may lead to mismatches between what is inside of the gzipped data, and what is in the YAML wrapper.

Finally, it is worth noting that Tiller does not currently cache any state information about releases. The definitive source of information about each release is the set of ConfigMaps associated with the release.

Frequently Asked Questions

There are a few questions about the ConfigMap setup that get asked occasionally.

How does Helm access the ConfigMaps?

ConfigMaps were designed to be a "general purpose" object for storing configuration data. When ConfigMaps were first introduced, that description sounded exactly like what we wanted.

Kubernetes provides several ways of accessing ConfigMaps, including the most common method of mounting a ConfigMap as a volume within a pod. We did not choose to use ConfigMaps as volumes.

Instead, Tiller queries the Kubernetes API on demand. For example, running helm history amber-gopher causes Tiller to contact the Kubernetes API and ask for all the config maps with the name amber-gopher.

Why aren't all of the revisions stored in the same ConfigMap?

There are a few parts to this answer:

  1. We wanted to make it easy to query revisions via kubectl. We consider kubectl to be the "expert level" tool for diagnosing Helm issues.
  2. ConfigMaps (as all resource types) have a 1M upper limit. This is true with Kube 1.5 and earlier due to a limitation of etcd.

Because of this design, you (as a power user who does not fear voiding warranties and tearing the tag off your mattress) can actually remove unwanted revisions using kubectl delete configmap... commands.

Why didn't you use Third Party Resources (TPRs)?

Simply because they were an alpha-level feature when we wrote Tiller, and still are not (in our opinion) at the level of stability we would like. We've enjoyed the fact that ConfigMaps are very easy to work with, both within Tiller and with other Kubernetes tools.

The Helm core developers have considered moving to TPRs in Helm 3, but we will probably not do it before then because of backward compatibility issues.

Why didn't you use (insert database name here)?

We could have used any number of storage backends for Tiller. But what we wanted was something that required very little additional overhead. Storing releases as native Kubernetes objects meant that our data storage was always as stable and persistent as the cluster, which seemed like the right metric for a package manager.

But we did write the storage driver interface in such a way that one could use another storage mechanism. In fact, Tiller ships with a second storage driver. It stores the release history in memory, and is useful for debugging or learning the storage system.

Why do you store the data in base-64 encoded gzipped binaries?

It's not for security (as some people have suggested), nor is it so that we can willfully obscure data. It's simply because we need to store a large chunk of data in a compact format. Again, we are limited to records no larger than 1M. Gzipping the protobuf serialized data has been highly effective, and base-64 encoding it matches the data format that Kubernetes expects.

Why didn't you use Secrets?

At the time we wrote Tiller, Secrets were more primitive than ConfigMaps. They have since hit feature parity. But there seems to be no compelling reason for us to switch from one to the other, and doing so would break backward compatibility.

Note that in spite of their name, Secrets are no more (inherently) secure than ConfigMaps. In fact, the encoding for a Secret would be identical to the encoding we use now. At some point, we are hoping we will be able to find a way to store the release data encrypted-at-rest.

comments powered by Disqus