Declarative Docker Enterprise with Packer, Terraform, Ansible and GitLab - part 2

12 minute read

This is the second part in a series about building and upgrading Docker EE clusters while striving for a declarative approach. See part 1 for more background.

Post updated Aug 22 2018: Added conclusion section.

Creation

The first time a cluster is to be created, things are a little different. There are no existing VMs, and thus no services running on them. This makes things simpler in terms of how we apply the planned changes using Terraform.

Terraform config

In broad terms, the nodes that will make up the cluster are divided into three groups, which is reflected in our Terraform config:

  1. UCP Controllers (named managers)
  2. UCP Workers (named workers)
  3. DTR replicas (named dtrs)

Here’s a list of variables that we define for every VM (using a map from VM name to value):

  • MAC address
  • Deployment stage
  • vSphere Resource Pool
  • vSphere host 1
  • vCenter folder path
  • Number of vCPUs
  • Memory size in MB
  • Disk size 2
  • Datacenter name (e.g. dc1 or dc2)
  • Primary network name 3
  • Storage network name 3 4
  • Storage IP address 4
  • Storage policy name 4

That’s a lot! Additionally, we have to ask the network team to create DHCP reservations and VIP addresses, ask our storage vendor to add the storage IP addresses to the whitelist for the given storage policy, and request TLS certificates for UCP and DTR from our internal PKI. Those are all tasks ripe for automation, but we haven’t got there yet.

But once all that is defined, we’re good to go and can run our pipeline in GitLab.5

GitLab pipeline

The GitLab pipeline is configured as a number of .gitlab-ci.yml files. Since we’re using GitLab Enterprise, we can include other YAML files from our main .gitlab-ci.yml file, leading to a slightly neater decomposition. However, we specify the same CI jobs for every cluster:

  • Pre*
  • Plan
  • Apply
  • Upgrade
  • Re-run Ansible

* The “Pre” job is a single job that is always run before the plan phase.

A screenshot of our provisioning pipeline

As can be seen in the above screenshot, we divide our pipeline into three phases (the “Upstream” phase comes from the job being triggered by the previous repo’s CI pipeline), namely:

  • Pre
  • Plan
  • Apply

Preparing essential Docker images

The “Pre” phase has one proper job: Pull any essential Docker images from a registry that already exists (we still need to solve this catch-22), save them in a .tar.gz file for later use.

Planning

The “Plan” phase has a job for each cluster. The job gives the Terraform variables file pertaining to the cluster as the argument to the -var-file option of the terraform plan command and saves the output in a file that is archived in GitLab. The job output log will show what actions Terraform will perform if the plan is applied. For a new cluster, this will include a number of VMs in each of the managers, workers and dtrs resource groups, as well as a number of so-called null resources which are used, among other things, to insert custom scripts at different points in the lifecycle of applying the plan. Namely, this is how we run Ansible after Terraform has created the VMs. Terraform keeps track of the null resources, but they don’t represent an object in any other system.

Applying

Hang on, this is where it gets complicated!

The “Apply” phase’s main job is the “Apply” job. It fetches the plan from the “Plan” job and applies it by running terraform apply <plan.out>. For a new cluster, the following things will happen in sequence:

  1. Terraform creates the UCP Controller VMs (managers) by cloning the Docker + UCP VM template from part 1
  2. Terraform creates an Ansible inventory file only including the manager nodes in a group called ucp (sorry about the naming inconsistencies)
  3. Terraform runs the script in the ansible_managers null resource
    1. Ansible runs a playbook that consists of multiple nested playbooks that contain the tasks for setting up the Docker Swarm cluster and installing UCP
      1. The current Swarm cluster status of each manager node is collected
      2. If there are no managers in any existing cluster (always the case first time around), the first manager will initialize a new Swarm cluster
      3. The UCP configuration file from our repo is copied into the VM and created as a Swarm config using the docker config create command
      4. The TLS certificates that we inject into the pipeline using a GitLab secret variable are copied into the VM and copied to a newly created ucp-controller-server-certs Docker volume (instructions here)
      5. The configured version of UCP is installed using docker run ... docker/ucp:<version> install (see the install docs)
      6. We wait for the ucp-reconcile container on the first manager node to exit successfully (see the UCP architecture page), indicating that UCP is up and running
      7. We get the join token for Swarm managers from the first manager node by running docker swarm join-token manager
      8. We collect a list of remaining manager nodes to join to the cluster
      9. For each of the remaining manager nodes, one at a time, we do the following:
        1. Run docker swarm join --token <token> <first-node-addr>:2377 to join the Swarm cluster
        2. Wait for the ucp-reconcile container on the first manager node to exit successfully (see the UCP architecture page), indicating that UCP is up and running

Once the manager nodes have been joined into a working Swarm/UCP cluster, we need to perform additional configuration of UCP for it to suit our needs. This is also done as part of the pipeline. Our Terraform config has a couple of additional variables:

  • Teams (a mapping from team name to LDAP filter)
  • Grants (a mapping from collection to a tuple of a team name and a role)

Using the UCP HTTP API, we create the teams, checking for each one that it does not already exist, and configure its LDAP sync settings. We then create all the collections from the “Grants” variable. Some collections run several levels deep, and for those, we add a dummy entry to create their parent collections first. Once the collections have been created, we create grants for the teams with the specified roles. All this required some extra Ansible foo - the import_tasks action comes in handy, as it allows running a sequence of tasks for each element in a given map/dict or list variable.

  1. Terraform then creates the Worker and DTR nodes
  2. Terraform creates an Ansible inventory file including the manager nodes in a group called ucp (sorry about the naming inconsistencies), the worker nodes in a group called worker and the DTR nodes in a group called dtr
  3. Terraform then runs the script in the ansible_all null resource

    1. Ansible runs a playbook that consists of multiple nested playbooks that contain the tasks for joining the worker nodes, install DTR and join DTR replicas
      1. We get the join token for Swarm workers from the first manager node by running docker swarm join-token worker
      2. The current Swarm cluster status of each non-manager node is collected
      3. If the node is not already in the cluster, we join it by running docker swarm join --token <token> <first-node-addr>:2377
      4. We add a Swarm node label to every node based on its value from the Terraform config’s deployment stage variable (this is later used for constraining Swarm services to specific stages when the user has access to multiple stages), and one that signifies what datacenter the node is in
      5. We also add a Swarm node label to assign each node to a UCP collection by setting the com.docker.ucp.access.label label, e.g. to prod, which will restrict which UCP users will be able to see and interact with/schedule services on the node
      6. We wait for the ucp-reconcile container on the first manager node to exit successfully (see the UCP architecture page), indicating that the UCP worker components are up and running
      7. When all worker and DTR nodes are correctly configured as plain workers, we need to install DTR:
        1. If there are no existing DTR containers on any DTR node (we check for the presence of a dtr-api-* container), we install DTR using the docker run ... docker/dtr install command, referring to the load-balanced VIP address of our UCP cluster (using the --ucp-url option), and provide the NFS address of our externalized storage (using the --nfs-storage-url option) as well as the TLS certificates (using the --dtr-{ca,cert,key} options)
        2. For each of the remaining DTR nodes, we run the docker run ... docker/dtr join command, which replicates the DTR database to the joining replica nodes and makes them part of the DTR cluster

Phew, that was a lot, and that was even skipping over a few details!

Upgrading

The upgrade starts out with a list of what template was used for each VM currently in the cluster, then swaps out one at a time with the most recent template.

E.g., if the template list looked like this:

templates = {
    "ucp1"    = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "ucp2"    = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "ucp3"    = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker1" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker2" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker3" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
}

… we first look up the deployment stage for each node. We process them in this order: UCP, DTR, Workers (in increasing deployment stage order, i.e. dev –> prod). Then we proceed by swapping in the new templat name for the ucp1 node, and the template list will look like this instead:

templates = {
    "ucp1"    = "Ubuntu1604DockerTemplate-20180820-022846-master-63776-pipeline"
    "ucp2"    = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "ucp3"    = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker1" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker2" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
    "worker3" = "Ubuntu1604DockerTemplate-20180604-154739-master-50798-push"
}

You’ll notice the template for ucp1 having changed from Ubuntu1604DockerTemplate-20180604-154739-master-50798-push to Ubuntu1604DockerTemplate-20180820-022846-master-63776-pipeline. A useful tool called json2hcl (Docker image) helps us convert between Hashicorp’s HCL configuration format and JSON, allowing us to manipulate the config with jq.

We then run terraform plan followed by terraform apply, and Terraform should then show the ucp1 VM as tainted since its template VM name has changed, forcing the VM to be destroyed and a new one created in its place. Thankfully, Terraform allows us to hook into the destroy process, running scripts before the VM is destroyed. Furthermore, we can abort the destruction of the VM if our script errors out, allowing us to stop the process and carry out manual intervention when facing specific failure conditions. Terraform remembers its progress and will re-attempt to replace the VM next time we run the pipeline.

Here’s the general process for upgrades:

  1. Retrieve list of templates for current nodes from Terraform
  2. For each stage (usually in this order: UCP, DTR, other workers):
    1. For each VM in that stage:
      1. Update the template name to the most recent from the VMware vCenter catalog (the selection mechanism is likely to be improved to be more granular)
      2. Run terraform plan followed by terraform apply, resulting in:
        1. Terraform running a pre-destroy script that runs an Ansible playbook:
          1. Draining the node from running tasks6
          2. Performing additional tasks for UCP* or DTR** nodes
          3. Having the node leave the swarm and remove it from the node list
        2. Terraform destroying the existing VM in vSphere
        3. Terraform creating a new VM in vSphere cloned from the new template
        4. Terraform running the scripts of the ansible_managers and ansible_all null resources as described in the Apply section:
          1. Joining the new VM to the Swarm cluster, wait for it to be marked as healthy
          2. Putting the VM in the right UCP collection (using node labels) so tasks can be scheduled on it
          3. Performing additional tasks for UCP* or DTR** nodes
          4. When the last VM has been successfully upgraded to the same Docker Engine version, UCP and DTR are upgraded if the desired version is newer than what’s currently installed:
            1. We take a backup of UCP using the docker run ... docker/ucp backup command, and copy the resulting archive to the GitLab runner for archival.
            2. UCP is upgraded by running docker run ... docker/ucp upgrade command, and the command runs synchronously until the upgrade completes.
            3. We take a backup of DTR’s metadata (not its image data, which resides on an NFS share), which takes a considerable time, and copy the resulting archive to the GitLab runner for archival. Note that we have to run the output of the backup command through gzip for the resulting file size to be small enough that Ansible’s expensive fetch action doesn’t consume all available memory on our GitLab runner VM.
            4. The process is similar for DTR, docker run ... docker/dtr upgrade, which upgrades each component one at a time on each DTR replica.

* If the node being upgraded is one of the UCP controllers, when it is to be destroyed, it is demoted such that the other controllers will be made to know about the (temporary) new number of controllers.

** If the node being upgraded is one of the DTR replicas, when it is to be destroyed, it is first removed from the DTR cluster such that the other replicas will know about the (temporary) new number of replicas. When its replacement has been joined to the cluster as a worker node, it is additionally joined to the DTR cluster to become one of the available replicas.

Conclusion

Using this method, we can use the same approach whether we upgrade Docker Engine, add new Ubuntu packages or apply a security patch, by simply doing that once in our VM template and rolling the upgrade out across our cluster. There’s no ambiguity about whether a node is updated or not; it is either based on the latest VM template, or it isn’t. If it is, it’s up to date.

Once we’ve gone through a few more supervised rolling upgrades, we want it to be a regular scheduled job (GitLab can help with visibility there compared with an oldfashioned cron job), e.g. running weekly.

In an upcoming post, I will provide a deep dive into the configuration of the various tools involved, namely Packer, Terraform, Ansible and GitLab, and any quirks that may be useful to other organizations.

  1. The direct host assignment will be replaced with DRS group membership, the management of which was introduced in the Terraform vSphere Provider v1.5.0

  2. Since the disk size must be identical to that of the VM template when using linked clones, this is practically always the same for all VMs. 

  3. These are made as indirect lookups to prevent data duplication.  2

  4. These related to our use of the Netapp Docker Volume Plugin 2 3

  5. Naturally, we have to define numerous variables in GitLab itself, too, in order to be able to externalize some configuration and to provide secret variables. 

  6. When a Swarm node is put in the drain availability status, it currently doesn’t wait for tasks to exit gracefully – rather, they are stopped rather shortly after, thus potentially causing downtime if the service does not have any replicas on other nodes in the stage. 

Leave a comment