Inside Docker’s Moby

Docker is one of the tools I feel strongly makes life better, though it has an incredibly steep learning curve. The process of building and using containers is difficult to explain, I find, even to skilled developers. To really understand what they are (and aren’t) requires hands-on, running-into-walls experience.

Docker for Mac, then, is a really great tool to start quickly and get your hands dirty. It hides most of the intricate detail of running a Linux virtual machine just to use Docker.

However, one of the pain points of Docker for Mac has been its shared filesystem performance. Their custom solution osxfs is excellent in terms of features, but unfortunately very sluggish for, e.g., PHP applications. Long forum threads have been dedicated to this issue.

My workaround until now has been to grudgingly ditch Docker for Mac, go back to Docker Machine and use the docker-machine-nfs tool to get snappy NFS shares going instead.

Setting up a new development machine gave me the opportunity to try and fix the issue properly.

Current work

Other people have already tinkered with what’s actually inside the virtual machine that Docker for Mac so carefully hides. In his article “Meet Moby in Docker for Mac”, Luc Juggery explains how to get shell access to the machine. Kenn Herman has already created a tool d4m-nfs that automates logging into the machine and setting up an NFS share.

One of the perks of how Docker for Mac works, is that its virtual machine boots from a ramdisk, which means changes to the system do not persist across restarts. Side-effect of this is that something like d4m-nfs has to rerun everytime Docker for Mac has started.

I wanted something (subjectively) neater, a solution that’d also work with containers that start automatically.

The end result is in my docker-for-mac-nfs repository on GitHub, but I’ll describe the process below.

Diving in

The first idea I explored was to see if there is a way to hook into the startup scripts. The virtual machine runs an Alpine Linux derivative called Moby, so the init system used is OpenRC.

The boot scripts mount a persistent filesystem /var, which is backed by a virtual harddisk. If any part of the boot process had executed something from there, we would be able to create a persistent hook, that’d even last across Docker for Mac upgrades. Unfortunately, it seems there’s nothing like that.

So on to plan B, which was to modify the ramdisk. The image for the ramdisk is located in the application bundle, typically at /Applications/Docker.app/Contents/Resources/moby/initrd.img. This is a regular ramdisk file for Linux to load, in compressed CPIO format (as revealed by running file initrd.img).

To get NFS running in Moby, we first need to add the nfs-utils package, which requires chrooting into the ramdisk and running apk. We can actually use Docker to do this!

As it turns out, macOS ships with tar and cpio utilities from libarchive, which allows easy conversion between formats on the fly. The command below creates a new (uncompressed) TAR-archive, from the ramdisk contents:

tar -cf initrd.tar @initrd.img

This is exactly the format docker import wants as input. Instead of creating an intermediate TAR-archive file, we can stream it directly into a new Docker image:

tar -c @initrd.img | docker import - dummy.example/moby-initrd:latest

Now we can start containers from this image and modify them in any way we want!

After we’re finished making modifications, exporting a container back to a compressed CPIO archive is as easy as reversing the process:

docker export my-container | tar -czf initrd.img --format newc @-

(Yes, the cpio and tar utilties actually work on all archive types supported by libarchive, not just CPIO and TAR archives.)

Making modifications

My initial approach to actually making modifications was to leverage the existing Docker tools and create a Dockerfile. But exporting a Docker image (the result of building a Dockerfile) as a flat filesystem turns out to be tricky. docker save dumps individual layers, instead of a flat image, and docker export only works on containers, not images.

So instead, I went for simply creating a container, copying in some files, and running a small script.

First, the simplest modification we need to make, installing nfs-utils:

apk update
apk add nfs-utils

Now for actually mounting the NFS share.

From the shell of a running Moby virtual machine, mounting the share was fairly easy:

mount -t nfs -o noacl,noatime,nolock,async 192.168.65.1:/Users /Users

Some observations trying various invocations of mount with different options:

When the virtual machine talks to your Mac host, the host sees traffic from localhost. Something to keep in mind when setting up your NFS exports.

You probably also want to map ownership of files on the share to your regular user on your Mac. This is what my /etc/exports looks like:
```
/Users -mapall=501:20 localhost
```
You need to start the rpcbind service in the virtual machine before NFS shares work. You can do so with:
```
/etc/init.d/rpcbind start
```
Your Mac needs the following setting in /etc/nfs.conf:
```
nfs.server.mount.require_resv_port = 0
```
My best guess is this is a quirk of the networking setup in Docker for Mac.
The udp transport doesn’t work. Perhaps also a networking quirk, but UDP traffic sent to the Mac host seems to not arrive at all.
I’ve always specified noatime and async for shares, so wouldn’t know the exact performance impact. But nolock apparently makes a big difference.

With this knowledge, and manually mounting the share working, I now wanted to automate this in the boot process.

All my attempts at adding the share to /etc/fstab in the ramdisk failed, with the connection seemingly timing out. Digging through the boot scripts, it looks like this should work, with them properly waiting for the network before trying to mount NFS shares. But I eventually stopped trying and moved on.

Instead, the mount is now setup using a boot script /etc/init.d/usermount, and added to the default runlevel by creating a symlink /etc/runlevels/default/usermount.

The boot scripts are in OpenRC format, but plenty of examples are already in the ramdisk, so referencing documentation is not really necessary. I simply specified that the usermount ‘service’ needs the rpcbind service, and should run before the docker service.

With that, build the new ramdisk, replace it in the application bundle, and restart Docker for Mac!

The result

The final product can be found in the docker-for-mac-nfs GitHub repository. It contains an import.sh script to create the base Docker image from the stock ramdisk, which you’ll need to do just once. After that, run make.sh to get a new image derived from the base, with modifications.

This way, you can quickly iterate on modifications by repeatedly running make.sh and copying the image to the application bundle. (You should probably keep a backup of the original ramdisk around somewhere!)

Though if you’re going to use this, note that new versions of Docker for Mac will require rerunning all steps. Upgrades will likely include a new ramdisk, which may also be incompatible with the modifications.