Six months ago ForAllSecure started analyzing Docker images. What does this mean? Imagine we have a user who wants us to fuzz their application. How do they give it to us? Do they tar it up? Do they give us access to an environment where it’s running? Do we integrate into their build pipeline? Applications are an entire ecosystem -- they require specific library versions, environment variables, users, etc. While it may seem like a small limitation conceptually, this added barrier can contribute to the friction between development and security teams, especially as organizations look to incorporate security as a part of their build cycles.
This is where Docker comes into play. We wanted Docker as a packaging solution for our users because it’s accessible and easy to use, but we didn’t want the overhead of the Docker daemon and all the other fancy features that come with it. We ended up building our own lightweight version of Docker, allowing ForAllSecure to accept Docker images, while running them with the barebones RunC runtime. This allows us to analyze code without requiring changes to developer behavior. In this blog, we’ll focus on the first part of the problem: how to ingest Docker images.
Accompanying this post is the open sourcing of Rootfs Builder, the tool we use to extract a rootfs from a Docker image. A Docker image provides a portable, efficient format. Instead of sending a 4GB rootfs across the wire, users can simply give us a string like “ubuntu:latest” and ForAllSecure servers can pull the image and extract the rootfs. This value prop doesn’t just apply to ForAllSecure. Rootfs Builder allows any run time to ingest a Docker image and extract the rootfs. We chose Runc, but the extracted rootfs is vanilla (i.e. there is no Docker specific information) and will work with rkt, NSJail, etc.
It’s worth noting that there were a few already existing solutions for building a rootfs from an image. Unfortunately, they do not handle whiteouts correctly (explained further below). I also want to give a shout out to Makisu and Kaniko (written by Uber and Google respectively), which do provide functionality for extracting an image from a rootfs. They solve the problem of building Docker images in an environment not suitable for Docker, namely Kubernetes. We chose to not use their software because it was still a bit too feature-full for us.
Now that you understand the problem we are trying to solve, we can dive into the question, what is a Docker image? How do we go from a Docker “image” which is just some string like “alpine:latest” to a running instance of Alpine? In short, an image is a glorified tarball. It consists of various layers, which when merged together, form the rootfs of the container. To understand these layer, we need to make a quick detour to discuss the underlying technology, OverlayFS (OFS).
OverlayFS layers two directories on a single Linux host and presents them as a single directory. The first directory, referred to as the “lower” directory, is read-only and usually provides the base file system. The second directory, referred to as the “upper” directory, reflects any changes made to the lower directory, while leaving the lower directory itself unchanged. If a file is removed, a “whiteout” file is created in the upper directory, to simulate the removal. The mount point is the 2 merged directories. Note that OFS requires support for extended attributes in order to store metadata regarding whiteouts.
OFS is the storage driver for Docker and, as you can imagine, is well-suited for containers. The lower directory is the filesystem, and then each layer on top is a snapshot of the container filesystem at a given time. OFS is an efficient way to generate and store diffs to a filesystem.
Try it out yourself:
# Create a tmpfs because a tmpfs has support for extended attributes
root@5e2bb73f7afd:/tmp/tmpfs# Mount -t tmpfs tmpfs /tmp/tmpfs
# Create the lowerdir, upperdir, workdir and the merged dir
root@5e2bb73f7afd:/tmp/tmpfs# cd /tmp/tmpfs
root@5e2bb73f7afd:/tmp/tmpfs# mkdir lowerdir, upperdir, workdir, merged
# I moved the alpine base filesystem into the lowerdir to make the example more meaningful
root@5e2bb73f7afd:/tmp/tmpfs# mv /alpine /lowerdir
# Create the OFS mount
root@5e2bb73f7afd:/tmp/tmpfs# mount -t overlay -o lowerdir=/tmp/tmpfs/lowerdir,upperdir=/tmp/tmpfs/upperdir,workdir=/tmp/tmpfs/workdir overlay /tmp/tmpfs/merged
# Notice that now the merged directory, previously empty, reflects the lower directory
root@5e2bb73f7afd:/tmp/tmpfs# ls merged/
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
# If we create a file in the merged directory, it gets reflected in the upper directory
root@5e2bb73f7afd:/tmp/tmpfs# ls upperdir/
root@5e2bb73f7afd:/tmp/tmpfs# touch merged/hello
root@5e2bb73f7afd:/tmp/tmpfs# ls upperdir/
# If we remove a file from merged, it’s also reflected in the upper directory as a whiteout file
root@5e2bb73f7afd:/tmp/tmpfs# rm merged/bin/arch
root@5e2bb73f7afd:/tmp/tmpfs# ls -la upperdir/bin/
drwxr-xr-x 2 root root 60 Sep 16 20:47 .
drwxr-xr-x 3 root root 80 Sep 16 20:44 ..
c--------- 1 root root 0, 0 Sep 16 20:47 arch
What’s in an image?
Now that we understand the tech underlying a Docker image, we can look inside and better understand its contents. The Docker imasge contains 3 components:
- Manifest.json: points to all the layers and the config.json.
- Config.json: contains metadata necessary for running the container. Think Docker version, environment variables, mounts, etc.
- Layers: These are OFS layers as described above and are named using the hash of their contents. When merged together, they form the rootfs.
Let’s step through this using Docker to shed some more light on this:
Start by Docker pulling and saving the image. `docker save` saves the images to a tar archive.
marli 9:32:50 /tmp () docker pull httpd
marli 9:32:57 /tmp () docker save httpd:latest -o httpd.tar
Extract the tar archive and look around.
marli 9:33:10 /tmp () mkdir httpd && tar -C httpd -xvf httpd.tar && cd httpd
marli 9:33:51 /tmp/httpd () ls
We see the manifest.json, which points to the config, as well as each layer.
marli 9:33:51 /tmp/httpd () jq . manifest.json
"Layers": [ "5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff/layer.tar", "49adc30abd56426f5889a7edbe19c463d6b5c4d0e515b531ef09b33d7839476b/layer.tar", "55185aeb1145d787ef29c0c917b0a169737cc479df5c799805b65f22f0be848e/layer.tar", "4354893d890be3cc8574d2a43a153f01d3dbf2c3b6680bb54ad91e56e2103b19/layer.tar", "83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea/layer.tar"
I also suggest taking a look at the config.json, but it’s a bit large to include here.
Let’s check out the base layer. We see a complete file system.
marli 9:58:08 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () ls
marli 9:58:11 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () mkdir layer && tar -C layer -xvf layer.tar
marli 9:59:44 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () ls -la layer
drwxr-xr-x 21 marli wheel 672 Sep 16 09:59 .
drwxr-xr-x 6 marli wheel 192 Sep 16 09:59 ..
drwxr-xr-x 72 marli wheel 2304 Sep 9 17:00 bin
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 boot
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 dev
drwxr-xr-x 69 marli wheel 2208 Sep 9 17:00 etc
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 home
drwxr-xr-x 7 marli wheel 224 Sep 9 17:00 lib
drwxr-xr-x 3 marli wheel 96 Sep 9 17:00 lib64
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 media
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 mnt
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 opt
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 proc
drwx------ 4 marli wheel 128 Sep 9 17:00 root
drwxr-xr-x 4 marli wheel 128 Sep 9 17:00 run
drwxr-xr-x 66 marli wheel 2112 Sep 9 17:00 sbin
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 srv
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 sys
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 tmp
drwxr-xr-x 10 marli wheel 320 Sep 9 17:00 usr
drwxr-xr-x 13 marli wheel 416 Sep 9 17:00 varM
Check out the top layer. Look familiar? This is simply an OFS upper layer described above.
marli 10:02:02 /tmp/httpd/83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea () ls -la layer
drwxr-xr-x 3 marli wheel 96 Sep 16 10:02 .
drwxr-xr-x 6 marli wheel 192 Sep 16 10:02 ..
drwxr-xr-x 3 marli wheel 96 Sep 9 17:00 usr
Building a rootfs
Docker merges all the layers to create a single rootfs. The merging itself is pretty straightforward. We do it ourselves in Rootfs Builder, which takes the name of a Docker image, pulls the tarball, and extracts it. For every layer, we iterate through each tar header. We make 2 passes. The first pass is to remove whiteouts, recall these are files or directories that were removed in a layer. In the second pass, we read the tar header for metadata about the file or directory, specifically the mode, uid, and gid. If the file doesn’t exist we create it, otherwise we simply replace it. We also have logic to update the uid and gid. This is necessary if you want to unshare user namespaces. For example, you may want to appear to be root in the container, but outside the container you are an unprivileged user. This requires creating a subuid mapping. The mapping looks something like:
root@21d94d3c4539:/workdir# cat /etc/subuid
This mapping reserves the first 65536 uids starting at 100000 under fas’s namespace. According to this mapping, uid 0 inside the container maps to 100000 outside the container.
Developers use Docker images every day, and now you know, they are just glorified tarballs. There’s plenty of room for improvement with Rootfs Builder. Outstanding features we hope to add will allow the user to specify:
- The number of layers to untar.
- A layer to omit when untarring.
- A binary the user is interested in. Instead of returning an entire rootfs, this will just return the binary.
But for now, hopefully Rootfs Builder will help users introspect into Docker images. You can get started with Rootfs Builder here: https://github.com/ForAllSecure/rootfs_builder