Vault + Swarm Docker secrets plugin (proof of concept)

9 minute read

Background

Secrets have been part of Swarm Mode since its inception, making it trivial to provide generic, static secrets to your distributed services. However, not all secrets are equal, and some use cases call for a more dynamic approach. Docker Engine allows installing a plugin and using it as a driver when creating secrets, letting the value of the secret be determined at runtime, thus enabling dynamic use cases. My talk at DockerCon 2019 in San Fransisco will cover how to write a secrets plugin that fetches dynamic secret values from HashiCorp Vault, and how to deploy it as a Swarm service.

Static vs. dynamic secrets

Generic, static secrets will only get you so far. Once you get to a large enough number of secrets, you’ll either need a very good naming convention, or make sure you label secrets very carefully. Even then they might become cumbersome to manage, which risks either creating too broad policies, or drowning yourself in bureaucracy. And there are other secret management solutions out there, and in this post I will discuss a specific use case with HashiCorp Vault.

Basic example of static secrets

Here’s a basic but complete example (adapted from the official documentation) of using the built-in secrets feature:

First, create passwords for a database:

$ cat /dev/urandom | tr -dc '0-9a-zA-Z!@#$%^&*_+-' | head -c 15 | docker secret create db_password -
$ cat /dev/urandom | tr -dc '0-9a-zA-Z!@#$%^&*_+-' | head -c 15 | docker secret create db_root_password -

Then write a Docker Compose file:

version: "3.7"
services:
  db:
    image: mysql:latest
    command: "--default-authentication-plugin=mysql_native_password" # See https://github.com/docker-library/wordpress/issues/313#issuecomment-400836783
    volumes:
      - db_data:/var/lib/mysql
    environment:
      MYSQL_ROOT_PASSWORD_FILE: /run/secrets/db_root_password
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_root_password
      - db_password
  wordpress:
    depends_on:
      - db
    image: wordpress:latest
    ports:
      - published: 8000
        target: 80
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wordpress
      WORDPRESS_DB_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password
secrets:
  db_password:
    external: true
  db_root_password:
    external: true
volumes:
  db_data:

Finally, deploy the stack:

$ docker stack deploy --compose-file docker-compose.yml example1

Wait a while for the database and blog app to start up, and you’ll be able to visit http://localhost:8000 and see the working Wordpress site, all without you ever seeing the password.

As you can see in the Docker Compose file, both MySQL and WordPress are instructed to use files inside the /run/secrets directory for the database passwords. This is the delivery method selected for secrets.

How static secrets work inside Swarm

When you use Swarm secrets without a plugin, secret data and metadata is saved in the Raft store of the Swarm managers. By design, secret data cannot be updated, and the Docker CLI offers no commands to update secret metadata (i.e. labels). You can, however, update secret labels through the Docker Engine API.

Once you create a service with a secret attached, the secret values are placed as files on a private tmpfs (i.e. an in-memory file-system) mounted inside the container, rather than in environment variables, which are too easily divulged to unconcerned parties.

The problem

You can configure Vault with e.g. your database’s sysadmin credentials, and then use a combination of policies and authentication mechanisms to have Vault dynamically create time-limited user accounts with role-based grants. The most basic authentication method Vault offers based on opaque tokens. Tokens can have policies attached to them, indicating what areas in Vault they give access to. But how do you get them to your containers?

You could of course create a static, long-lived token with the right policies attached, and then type that in as a static secret in Swarm. Then you could attach it to the relevant service, and the service could then communicate directly with Vault to read the secret data, e.g. database credentials. But then you get the problem of rotating the token you typed into Swarm, which either becomes a bureaucratic, repetitive task, or else you risk having to put that secret into your CI/CD system, which can lead to secret sprawl. If you don’t rotate the token, you instead run the risk of the token some day being intercepted, and then you have to rotate it - if you notice, that is.

Ideally, you would give a new token to every instance of your service, and use the use-limit feature of Vault to make sure you can detect interception and ensure a stolen token cannot be reused. You can read more about this concept, which Vault calls Response Wrapping. However, with static Swarm secrets, there is no way of making use of response wrapping. If you typed a response wrapped token into Swarm and made use of it in a service, only the first instance would be able to make use of it, which is by design.

A solution

In order to solve this challenge in a satisfying way, you’ll need to use one of the several extension points of Docker Swarm.

Introduction to the pluggable secrets backend

The pluggable secrets backend allows you to specify a “driver” when creating a secret, e.g. docker secret create --driver <driver_name> <name> <file|->. The plugin must advertise that it implements the "secretprovider" interface, and Docker provides a helpful repository for getting started with writing such a plugin in Go.

When a driver is chosen for a secret, the Swarm manager still looks up the metadata in the raft store, but will request the data from the plugin with the given driver name. The corresponding plugin must be installed on the managers in the Swarm.

Liron Levin from TwistLock contributed the pluggable secrets backend back in 2017.

A proof-of-concept plugin

My idea was to write a plugin that when used will call out to Vault to deliver secret values to Swarm service tasks. One of the requirements was that it should support response wrapping. It was not hard to write, given the go-plugin-helpers repo and the excellent official Vault Go client.

The plugin works like this:

  1. Receive request, including the secret name and labels, service name, ID and labels, the task name and ID
  2. Based on the secret labels, the plugin will then
  3. Create a token on behalf of the service task, with a Vault policy on the token with the same name as the service, and then, optionally:
    1. Use that token to read a generic key/value secret from a specified path, and optionally:
      1. Return a specific field inside that path
      2. JSON-encode the returned value
    2. Optionally use response wrapping to deliver the returned value

    The source code for the proof of concept plugin is available on GitLab.

A complication

Now, when I first tried out the plugin, it worked as intended. However, when I scaled up the service to 2 replicas, I noticed two things:

  1. When setting the secret label that indicated that a generic token should be returned, sometimes the two replicas got the same value, and
  2. When setting the label to use response wrapping, sometimes only one of the tasks would succeed, whereas the other would be told by Vault that the response wrapping token had already been used.

When investigating this puzzling finding, I read through some of the code that assigns secrets and other resources to Swarm nodes. The code responsibly caches the values of secrets such that for each node, any given secret is only requested once from either the raft store or the secret plugin. However, that does not help if the plugin returns values that are supposed to be individual for each task, e.g. when using response wrapping.

I set about writing the necessary changes as PRs #2735 and #2735 for docker/swarmkit. They are currently merged/vendored in moby/moby’s master branch, and should release with Docker 19.03.

Limitations

Until Docker 19.03, you cannot use response wrapping with the plugin. It goes without saying that I do not recommend or even suggest using this plugin anywhere near production, or even in daily use. Rather see it as an example of what can be done with Swarm and its extension points.

Also, the plugin currently relies on getting its own access to Vault (a more privileged token that can perform the plugin’s functions) through suboptimal means. Because plugins in Docker, even if run as containers, are currently very different from regular containers, there are several features you cannot make use of. Namely, even though you can have a special type of Swarm service to install the plugin in your Swarm, there is currently no way for you to attach Swarm secrets to such a service, static or otherwise. This leaves you with the problem of safely bootstrapping the plugin itself. The only method I could think of, which really is half-baked, but works in this POC, is to give the plugin access to the Docker socket of the manager node it is installed on, and use a helper service to hold the bootstrapping token. The plugin then uses the Docker API to find the helper service’s container and reads the bootstrapping token from there.

Usage

See the README in the repo for instructions; here is what it says:

Run ./rebuild.sh, you should get the following - note the different tokens in the two task instances of the snitch service:

$ ./rebuild.sh
...
Success! Uploaded policy: snitch
Key              Value
---              -----
created_time     2018-08-30T02:21:27.980476389Z
deletion_time    n/a
destroyed        false
version          1
Key              Value
---              -----
created_time     2018-08-30T02:21:28.088984314Z
deletion_time    n/a
destroyed        false
version          1
fiqw1xaqjqofvinflvmnzo83t
zj2hvlev230x0s1ei9t25ft9m
overall progress: 1 out of 1 tasks
ij2r01ffy6ak: running   [==================================================>]
verify: Service converged
sirlatrom/docker-secretprovider-plugin-vault
5ioiauam5n9nms9neb6szbwxj
n5j855gu0i460bno1aaw3neq9
t99bbs7y5c1y61tyxxhd3msoj
i9sbmcaqc46v5a44u00fkfxfv
snitch.2.y42sqzj8524y@redacted_host    | secret:              this_was_not_wrapped
snitch.2.y42sqzj8524y@redacted_host    | wrapped_secret:      1afd51f9-c1a2-d4ec-8ceb-8e043b77b53a
snitch.1.gpy8rj3oxz0n@redacted_host    | secret:              this_was_not_wrapped
snitch.1.gpy8rj3oxz0n@redacted_host    | wrapped_secret:      6567b96c-338e-cd3b-e9bc-67c65597fd0f
snitch.2.y42sqzj8524y@redacted_host    | unwrapped_secret:    this_was_once_wrapped
snitch.2.y42sqzj8524y@redacted_host    | generic_vault_token: ddef57f5-a235-923c-4e7c-0a519d307f10
snitch.1.gpy8rj3oxz0n@redacted_host    | unwrapped_secret:    this_was_once_wrapped
snitch.1.gpy8rj3oxz0n@redacted_host    | generic_vault_token: b7b27691-1776-ae52-ffc3-b6a59152d12f
snitch.2.lepelzpcjscj@redacted_host    | secret:              this_was_not_wrapped
snitch.2.lepelzpcjscj@redacted_host    | wrapped_secret:      84df01da-11f9-acba-0373-89bd1f161798
snitch.2.lepelzpcjscj@redacted_host    | unwrapped_secret:    this_was_once_wrapped
snitch.2.lepelzpcjscj@redacted_host    | generic_vault_token: 9214b53f-027c-6552-a0a9-1b18783550d1
snitch.1.edwdvkmvpbke@redacted_host    | secret:              this_was_not_wrapped
snitch.1.edwdvkmvpbke@redacted_host    | wrapped_secret:      df1714e1-a1b6-07eb-10c1-7e1ba4e73022
snitch.1.edwdvkmvpbke@redacted_host    | unwrapped_secret:    this_was_once_wrapped
snitch.1.edwdvkmvpbke@redacted_host    | generic_vault_token: bedf29d5-fbdd-6085-7809-f113078c66b1
...

Future work

Configs can be created with the --template-driver option, allowing you to insert placeholders for secrets (as described here) in your config file and have those be resolved each time a task (container) for a service is created. There will be a template_driver equivalent in the Docker Compose file format eventually (here are the pull requests: docker/cli#1746+docker/compose#6530). Once that is in place (tentatively set for Compose file format version 3.8), you’ll be able to combine configs and secrets and secrets plugins to build a powerful and expressive config management solution, while keeping the concerns of the systems involved neatly separated.

My dream is to be able to write a docker-compose file like this (obviously made-up and not realistic):

version: "3.8"
services:
  app:
    image: ...
    configs:
      - source: config.yml
        target: /etc/app/config.yml
    secrets:
      - vault_token
configs:
  config.yml:
    template_driver: golang
    file: config.yml.tmpl
secrets:
  vault_token:
    name: vault_token
    driver: sirlatrom/docker-secretprovider-plugin-vault
    labels:
      dk.almbrand.docker.plugin.secretprovider.vault.type: "vault_token" # Secret will contain a Vault token
      dk.almbrand.docker.plugin.secretprovider.vault.wrap: "true"        # Enable response wrapping

and a config.yml file like this:

vault_addr: https://vault.example.com:8200
vault_token: {{ secret "password" }}

and the config file would contain a response wrapped long-lived token that the app could then use.

Wouldn’t that be great?

Leave a comment