Kubernetes or: How I Learned to Stop Worrying and Love the Container

what3words
10 min readMay 10, 2019

How we made our tech scaleable and global using Kubernetes

By Andrew Vaughan, Lead DevOps Engineer

The Challenge

Development

Development was slow. Our developer release-cycle process consisted of using a combination of common tooling; HashiCorp’s Packer to build AWS Amazon Machine Images (AMIs), Red Hat’s configuration management tool Ansible, and AWS’s CloudFormation service. This process produced self-contained CloudFormation stacks using pre-baked AMIs within EC2 Autoscaling Groups, each with their own load balancing rules specified within an AWS Elastic Load Balancer (ELB).

Disregarding any human or technical errors this build procedure can take up to 15–20 minutes for new code to be released to any single environment. New code releases are expected to be deployed to a number of pre-production environments and also undergo testing before going live. This entire process can stretch out to a good number of days!

Source: XKCD: https://xkcd.com/303/

As developers we want faster release-cycles in our pre-production and production environments, so we can introduce new features faster and roll back or forwards from issues.

Moving to Kubernetes would shift our development away from building Operating System (OS) images and provisioning fresh Virtual Machines (VMs), to using Docker and Kubernetes in tandem to streamline local development and release-cycles to our operating environments.

Cost

Our infrastructure was not cost-effective. For every application we would have at least two EC2 server instances (for High Availability). Thanks to the efficiency of our code these often use only 2–5% CPU, even on the smallest available EC2 instances. In the computing world, especially in the cloud, CPU is very expensive.

Source: rawpixel: https://www.pexels.com/@rawpixel

As a business we want to run our infrastructure in a more cost-effective manner without sacrificing performance.

Using Kubernetes, we can pack multiple applications onto fewer EC2 instances. Moving to Kubernetes would reduce the cost of running our infrastructure as Docker containers can be granted as little or more compute resource as is required, calculated from load testing and everyday usage of CPU and Memory. Imagine our Docker containers as little Tetris blocks that can be fitted onto cloud server instances efficiently using a Kubernetes feature known as “bin packing” which would take advantage of our compute resource total capacity.

Geography

Globally we had poor latency. We only operated our applications and infrastructure from a single Amazon Web Service (AWS) region. However, as we expanded to a global client-base it quickly became apparent that a lot of our customers were seeing poor latency to and from our applications outside of this immediate network catchment area.

Source: slon_dot_pics: https://www.pexels.com/@slon_dot_pics-129524

As our client-base increases both in volume and global scale we want to be able to distribute our applications geographically, offering lower latency to customers far and wide.

Earlier, I mentioned the reduced cost of operating on the Kubernetes platform, which enables us to build and operate separate clusters located in different parts of the world at a similar cost to what we previously spent.

Logging and Monitoring

We had very little visibility over what our applications or infrastructure were up to. For data analytical reasons we were already handling logging from our main public API. However, for other applications our visibility was close to zero and the whole affair was very hands on and manual.

Source: pixabay: https://www.pexels.com/@pixabay

As operations we want more useful and actionable insight on the day-to-day running of all our applications and infrastructure. With Kubernetes, came with some extra benefits that would help us better monitor our applications and infrastructure.

Kubernetes is a very popular choice and there are many solutions for us to choose from, such as Datadog, Sysdig, or Prometheus with Grafana for monitoring, graphing and alerting off all manner of metrics available. For logging we can use solutions such as Elastic’s Beats, or Fluentd to feed our logs into either Elasticsearch or AWS S3.

The Solution

A High Level Overview

Our solution from the ground up is to use a simple wrapper script that interfaces with Ansible-playbook which:

  • SOPS decrypts configuration files that contain secrets
  • Groups specific configuration files for the selected environment (stage and region)
  • Downloads code artifacts from S3, Github or Http(s)
  • Builds templated configuration files for Docker
  • Builds Docker images using those templates and artefacts
  • Pushes Docker images into our private Docker repository using AWS Elastic Container Registry (ECR) (per environment, per region)
  • Builds templated Helm Chart configuration files aided with Mozilla SOPS
  • Triggers Helm install or upgrade against one of our Kubernetes clusters running on AWS Elastic Kubernetes Service (EKS)
  • Notifies our development team channels on Slack of the changes

We use AWS Route53 Health Checks in combination with latency-based A record (ALIAS) DNS to failover traffic for our core applications between the Virginia, London and Seoul regions in production. This means that if one of these core applications were to fail in one of these regions, the health check will fail, disabling the latency-based record for that failed region’s application, resulting in clients’ DNS requests for that record resolving to the next closest AWS region based off latency between the client and the other remaining AWS regions where the same application is running.

We currently use Datadog to monitor infrastructure metrics with graphing and alerting configured to alert us when Kubernetes Pods use more compute resources than is requested, etc. We use Fluentd to tail logs from the Kubernetes Nodes and streams them into artefacts stored in AWS S3 that can in turn be queried by AWS Athena.

A Little Lower Level Overview

Disclaimer: None of the code snippets below will give you an e2e setup but will give you a rough understanding of what we are achieving.

Example of the wrapper script:

Usage: ./wrapper prod london myapp

#!/usr/bin/env bashset -euf -o pipefailSCRIPT_HOME=$(dirname ${0})
STAGE=${1}
shift
REGION=${1}
shift
APPLICATION=${1}
shift
ANSIBLE_EXTRAS=${@:- }
# decrypt STAGE and REGION secrets to git ignored file
AWS_PROFILE=${STAGE} sops -d ${SCRIPT_HOME}/group_vars/secrets_${STAGE}.${REGION}.yml > ${SCRIPT_HOME}/group_vars/unencrypted_secrets_${STAGE}.${REGION}.yml
# run ansible-playbook
ansible-playbook --extra-vars "STAGE=${STAGE} REGION=${REGION} APPLICATION=${APPLICATION} ${SCRIPT_HOME}/playbook.yml ${ANSIBLE_EXTRAS}
# remove unencrypted files
rm -f ${SCRIPT_HOME}/group_vars/unencrypted_secrets_*.yml

This wrapper enables us to run ansible-playbook whilst also using AWS KMS to decrypt STAGE and REGION specific secrets safely without much thought.

Example application configuration for Docker and Helm:

Name: "myapp"AppName: "{{ Name }}"SubDomain: "myapp"Environments:
- "dev.london"
- "preprod.london"
- "prod.london"
Version: "{{ Values[Name].VERSION }}"Tag: "{{ Version }}"Docker:
Download:
- src:
"s3://my-artifacts/myapp/myapp-{{ Version }}.tar.gz"
Dockerfile: |
FROM alpine:3.8
RUN apk add — update \
tar \
gzip \
nginx=1.9 \
&& rm -rf /var/cache/apk/*
COPY myapp-{{ Version }}.tar.gz /myapp.tar.gz
RUN tar xzvf /*.tar.gz -C /var/www/
&& rm -f /*.tar.gz
Helm:
- apiVersion:
extensions/v1beta1
kind: Deployment
metadata:
name:
"{{ AppName }}"
labels:
app:
"{{ AppName }}"
spec:
replicas: "{{ Values[Name].Replicas }}"
strategy:
type: RollingUpdate
selector:
matchLabels:
app: "{{ AppName }}"
template:
metadata:

labels:
app: "{{ AppName }}"
spec:
containers:
- name:
"{{ AppName }}"
image: "{{ AwsEcrPrefix }}/{{ AppName }}:{{ Tag }}-{{ DockerHash }}"
imagePullPolicy: "Always"
command:
- sh
- -c
args:
- /srv/wrapper.sh
env:
- name: VALUE1
value: "{{ Values[Name].VALUE1 }}"
- name: VALUE2
value: "{{ Values[Name].VALUE2 }}"
ports:
- containerPort: 80
name: http
resources:
limits:

cpu: 50m
memory: 64Mi
requests:
cpu:
50m
memory: 64Mi
livenessProbe:
failureThreshold:
3
httpGet:
path: /healthcheck
port: http
initialDelaySeconds: 15
timeoutSeconds: 5
readinessProbe:
failureThreshold:
3
httpGet:
path: /healthcheck
port: http
initialDelaySeconds: 10
timeoutSeconds: 5
- apiVersion: v1
kind: Service
metadata:
name:
"{{ AppName }}"
labels:
app:
"{{ AppName }}"
spec:
type:
ClusterIP
ports:
- name:
http
port: 80
targetPort: 80
selector:
app:
"{{ AppName }}"
- apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: "{{ AppName }}"
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: "{{ SubDomain }}.{{ DnsDomain }}"
http:
paths:
- backend:
serviceName:
"{{ AppName }}"
servicePort: 80
- host: "{{ SubDomain }}.{{ REGION }}.{{ DnsDomain }}"
http:
paths:
- backend:
serviceName:
"{{ AppName }}"
servicePort: 80

{{ Version }} can be overridden from the wrapper to alter which artefact is downloaded and what {{ Tag }} should be used for tagging Docker.

{{ DockerHash }} is an MD5 hash generated by Ansible dynamically by inspecting the contents of the application’s Docker directory (containing the Dockerfile, artefacts, etc.) to keep the Docker tag unique enough when the {{ Version }} and {{ Tag }} stay the same, i.e. in development you may be pushing and pulling an artefact called myapp-develop.tar.gz where the name stays the same but the contents are in flux. The Docker image and tag will look something like myapp:develop-5ae768241ccef13c9166b804bfe440f6 or myapp:1.0.0–0f764241465e3b55fdf305b6cd3bf47f.

{{ AwsEcrPrefix }} is declared in a centralised group_var configuration file specific for the STAGE and REGION the Kubernetes cluster is hosted, and is used as the target and source Docker repository when pushing and pulling images.

{{ DnsDomain }} is declared in the same centralised group_var configuration file as {{ AwsEcrPrefix }} and is used in our Kubernetes Ingress resource to determine the DNS domain for the STAGE and REGION. We have 2 hosts declared in this resource, one is used by the global DNS with added failover and the other is for the supporting Route53 health check in place. These would typically look like myapp.what3words.com and myapp.london.what3words.com respectively.

{{ Values[Name].VALUE1 }} is an example of an abstracted value found within a centralised or formerly encrypted configuration file. {{ Values }} is declared in the same centralised group_var configuration file as {{ DnsDomain }} and {{ AwsEcrPrefix }} and is a STAGE and REGION specific dictionary with keys for all applications.

Example values of a centralised group_var secret configuration file:

AwsEcrPrefix: "my-prod-account-number.ecr.london.amazonaws.com"
DnsDomain: "what3words.com"
Values:
myapp:
Replicas: 2
VERSION: "0.0.1"
VALUES1: "Hello"
VALUES2: "World"
notmyapp:
Replicas: 3
DATABASE_URL: "psql://much@wow:7654"

Why Ansible?

Ansible is versatile and useful for many tasks, but it does have its own problems, especially when you are forced to run shell commands to get what we need from the tool. We decided to keep using Ansible as a tool because as a team we already had in-depth experience and skill in using it and there would be a much less steep learning curve adapting it for our current needs compared to other tools out there.

Why Helm?

We really like the way Helm packages Kubernetes resources together and allows for easy rollbacks of complete running applications, and quick cluster overviews of what has been deployed to our Kubernetes clusters.

Why use Ansible (and Jinja2) to template Helm Charts?

This might seem redundant but while we really like Helm, we really do not like the non-conforming way values are abstracted Chart to Chart from Helm’s stable repository, this set an irregular precedent that we did not want to follow. As we are already templating using Ansible and Jinja2 there was no logical reasoning in dealing with Helm’s abstraction using Mustache as well.

Doing it this way streamlines our operation, reduces layers of unneeded abstraction and grants us more flexibility in re-using high-level configuration as we can now easily inject environment variables per staging environment and per region, including such values as the location of the Docker image in the repository or the username and password for particular database, etc.

Why SOPS?

Mozilla Secrets Operations (SOPS) is an amazing tool for keeping your secrets secret and gives us peace of mind when committing such secrets to source control. We use SOPS in conjunction with Ansible and our Jinja2 templates to expose secret values into our Helm Charts when we are deploying.

SOPS can work with a variety of different cryptographic methods to encrypt secrets, but we use it with AWS Key Management Service (KMS) with our locally configured AWS Identity and Access Management (IAM) user accounts to remotely access the encryption and decryption keys to seamlessly access our secrets as part of our pipelined deployment process.

Why AWS?

We are a pre-existing AWS customer and have been running our infrastructure in AWS for a number of years. As a team we are experienced with the AWS product and the services offered. Migrating to Kubernetes but staying with AWS was a no brainer as it meant we did not have to migrate “satellite” infrastructure that is used to compliment our core applications, such as storage of assets and artefacts in AWS S3.

Migrating to Kubernetes in many ways has made us more cloud-agnostic as the tooling we have built simply requires a functioning Kubernetes cluster and this could realistically be hosted on any cloud provider, or even on bare-metal.

Why EKS?

We originally began this project using Kubernetes Operations (KOPS) to build our Kubernetes clusters as EKS was not available in our main focused AWS region at the time. This involved us maintaining a fault-tolerant multi-master cluster alongside our worker cluster that actually did the heavy lifting for our applications.

As soon as EKS became available we actioned in plans to migrate from KOPS to it as we consider the total cost of ownership lower for our team to manage EKS over Kops.

The Future

We are currently working to introduce Prometheus and Grafana, with use of Alertmanager to replace the capabilities offered by Datadog. We plan to also have Fluentd stream logs into Elasticsearch for closer to real-time analytics, as well as into AWS S3 for longer term analysis. Watch this space!

Would you like to join our team?

We’re hiring! Take a look at https://what3words.com/jobs and filter by “Technology” to check out our open roles.

--

--

what3words

what3words is the simplest way to talk about location. It has divided the world into 3m x 3m squares, each with a unique 3 word address.