26 April 2019
Container guidelines - Part 2

Enter headline here

This article contains a list of guidelines, gathered from various sources, complemented with professional experience and some common sense added. They are primarily for my own reference, but maybe they can be of use to others. For now, they focus primarily on Docker environments, but I might add Rkt or other initiatives in the near future.

1. Container environments have specific characteristics
A container environment contains images and sources. By invoking a build based on a certain configuration, sources can be turned into images. Images can be tested (and reproduced) and promoted to a different environment. This entire workflow is automated.

Access to the images and (configuration of) sources is limited to privileged users or processes. Builds can only be started by privileged users or processes. Logs and reports of the build and test processes are available to services, either within or external to the pipeline. This implies that a security strategy is available for each container environment.

Every environment in a pipeline has the same kernel version.

2. Builds are automated and build environments are managed by automation tooling
Using container technology as a vehicle for a microservices architecture means several (immutable) images containing parts of the resulting application need to work together seamlessly. This by itself is a challenge. Pushing these images to production manually, while possible, is not advisable.

Likewise, setting up the build environment and managing the pipeline manually is a complicated endeavour. To manage the pipeline, automation tooling is a prerequisite. A dedicated platform is preferable.

3. Docker container images are built using a Dockerfile
Dockerfile is a text document containing instructions – in order – for building an image. Files and directories can be ignored by using a .dockerignore file. For the sake of maintainability, use a recognizable structure in the Dockerfile (for example, alphabetical sorting).

During the build process, Docker will verify for each line in the Dockerfile if it has a reusable, cached image. It is important to understand the caching rules.

Dockerfiles are stored within the resulting image in /root/buildinfo. Use a linter such as Dockerlint to check the syntax of a Dockerfile.

4. Use a bare minimum of packages, processes and layers
Within a container, only one specific task is executed. Such a task can consist of one or more processes. A container contains the bare minimum of packages necessary to run its task and process(es). When using a package manager (both for the application environment and underlying operating system, in that order), its cache must be cleared. This can be done by squashing the image after the build, but this has its disadvantages – such as losing history and metadata of the layers of the image. Preferably an image does not need squashing by taking extra care during the build process. Before emptying package cache, however, unnecessary packages need to be removed. Such packages are usually the toolchain needed to compile the application in the image. Also, documentation can take up a lot of space and needs to be removed.

Docker images consist of layers and every command in a Dockerfile creates a new layer. The number of layers should be kept to a minimum, without interfering with legibility and understandability. This can be achieved by grouping commands (with ampersands) in the Dockerfile, for example everything related to installation and configuration of an application, into single layers.

5. Use labels
Labels are items in the Dockerfile used to provide information to the end-user. They can be descriptive (containing information on for example version, author or architecture) and actionable (such as run, install and uninstall commands and syntax). Labels need to be applied consistently and where possible automatically. Specific platforms, such as OpenShift, have their own set of labels.

6. Docker containers use a base image
Several base images exist and which one to use is a matter of preference. Important aspects to consider are the operating system used (a certain flavor of Linux or Windows, for example) and the size of the image ranging from several to hundreds of megabytes). It is possible to create your own base images. Images are publicly available through the Docker Hub.

7. Naming conventions
Docker images are stored in registries, such as the default Docker Hub. The structure of the registry is registryname:portname/username/repositoryname:tagname. Username and Repositoryname may contain lower case letters, digits and separators (periods, underscores or dashes) and should not be longer than 256 characters. Tagname should contain the version of the image and may contain upper- and lowercase letters, digits and separators. A tagname cannot be longer than 127 characters. The default value for tagname is “latest”. This is not updated or enforced automatically, which means there is no guarantee that an image with this tag is actually the latest version. Docker images have a universally unique identifier (UUID) which is a SHA256 computed hash of the image.

Docker layers are referenced by images based on a SHA256 digest. When the content of the layer changes, a new hash is calculated.

Docker Containers have names, which are either added by the user or automaticallygenerated. The latter form is a combination of one of 92 possible hardcoded words combined with – through an underscore) one of 160 notable scientists and hackers (apart from the combination boring_wozniak, because Steve Wozniak is not boring).

8. The container lifecycle
It is possible to look at the container lifecycle from various perspectives. For one, a container itself has five distinct states in its lifecycle, which are created (either manually or automated by a build process), runningpausedstopped and deleted. But a more holistic view of a container lifecycle is possible as well, which is more in line with a traditional software lifecycle, with the following states: builddeliverdeployrun and maintain. For each of the various states, tools exist. The first perspective states are managed by Docker or Kubernetes, whereas the second involves pipeline management and CI/CD toolsets.

9. Security
Several ways exist to pass credentials and secrets to containers, but none of them are particularly safe. Using “docker inspect” often reveals such secrets and containers and images available through a public registry have these secrets and credentials embedded. Even a squashed image can reveal its secrets through the build cache. It is important to notuse default configurations and remove any default passwords.

In order to ensure decent security handling within containers, make use of dedicated container orchestration functionality such as offered with Kubernetes, or other dedicated solutions.

Due to the shared kernel, the container is only as secure and safe as the host operating system. This implies using a stable, secure host operating system that receives regular security updates and making sure the system is up-to-date. On a container level, only run containers that come from trusted parties and run without privileges as much as possible.

10. Starting applications within a container
There are three ways to start an application within a container: (1) call the binary directly(for simple applications with no need for environment variables), (2) use a script to call the binary (for slightly more complex applications requiring environment variables) and (3) use systemd (for service oriented applications) to start the application.

11. Ephemeral containers and storage
Containers can be created, stopped, destroyed and replaced swiftly. Any change to a container will wipeout state. If persistence is necessary, storage needs to be setup. The preferred method of doing this is by creating a docker volume (preferably before creating the container that will be using it), which can be shared and reused among containers. Such a volume should be named appropriately. It is possible to specify a specific devicedriver when creating the volume.

Even so, it is necessary to monitor containers to prevent them running out of disk space. This holds particularly true when logging container output to specific log directories. A failed state in a container can create a lot of logging output.

12. Logging
Application logging should not be limited to specific log directories within specific containers, but centralized in a separate logging environment. This ensures that logging is preserved no matter what the state of the container is. Such an environment can be a logdirectory that the container shares with the host (allowing the host logging system to access it), or shared storage. Another option is to use dedicated logging agents running in the container, such as available with logging software.

13. Testing
Use a services-based approach when testing containers. The focus should be on the (expected) output of the container, not the software that delivers the output – provided this software is installed by a package manager. In other words, if a container contains a database that is installed through a package manager, it is not necessary to test the database software – instead, test whether the database works as expected.

A container usually does not contain a single, monolithic application – even though this is possible, it beats the purpose of containers. More often than not, containers are the delivery mechanism for a microservices architecture. This implies that testing containers involves distribution and can quickly become complicated.

14. Porting
Do not just transfer an application to a container. Not many applications are container-ready out of the box. Thoroughly assess your application (for example by using the Twelve-Factor App – https://12factor.net/ -guidelines) before the transfer and chose a specific scenario, such as those outlined by Gartner back in 2011. If your application is not container-ready, you can revise (modify the code) or completely rebuild it. If your application is container-ready, you can rehost (move from the current infrastructure to an IaaS) or refactor (move from the current infrastructure to a PaaS). There is always the possibility of replacing an application all together. There is no silver-bullet scenario for specific types of applications.