How to Not Be the Engineer Running 3.5GB Docker Images
May 9, 2016 | 7 min read
Let’s cut to the chase: you’re adopting a microservice architecture, and you’re planning to use Docker. There’s a reason it is so en vogue – it solves lots and lots of problems and has zero negative effect on our projects, right?
As with every tool, technology, or paradigm thrust upon us as we scrappily try to maintain our sanity while jumping from shiny to shiny, we need to learn the gotchas.
To do this, I like to start with a simple question: How might this new shiny bite me on the ass, and what can I do to avoid having teeth marks on my rear?
I want to tackle a problem I have seen repeatedly during my consultations with teams/organizations adopting Docker.
Behemoth Docker Images
$ docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEawesome-micro-service latest 61562a134d38 About a minute ago 3.5 GB
Woah! Look at the size of that image. Awesome microservice is 3.5GB! So much for micro.
What on earth is a Docker image anyways?
To understand why our images are big, we need to understand what images are in the first place.
A Docker image is the output of a
docker build. The build process runs each of the instructions within a
Dockerfile. Each instruction executed creates a layer. Layers encapsulate the file system changes that the instruction has caused. A Docker image is a collection of layers.
Let’s look closer so we can describe a Docker image in more detail.
Assume we’re going to bring Docker into our PHP workflow. In order to run our PHP application, we need a Debian-based system with PHP installed.
We’ll need to describe the environment required to run our application within a Docker container.
# DockerfileFROM debian:jessieRUN echo "Building ..."RUN DEBIAN_FRONTEND=noninteractive apt-get updateRUN DEBIAN_FRONTEND=noninteractive apt-get install php5-cli
Super simple. Super declarative. Though completely useless until we build it. The build process takes a
contextand produces a
contextis the directory that will be sent to the Dockerfile to satisfy any file requirements, such as
# docker build -t -f# If the Dockerfile is within the root of our context, we can omit the -f$ docker build -t my-debian-php:latest -f Dockerfile ....$ docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEmy-docker-php latest 61562a134d38 About a minute ago 163.5 MB
So what’s actually going on? What’s inside my Docker image?
It’s a file system. When you run an
apt-get install vim, all you’re telling the computer to do is put some files on your hard drive. The Docker image encapsulates that and keeps track of all new / modified / deleted files.
These file system changes are tracked in layers. Each layer is the the encapsulation of the file system changes for each instruction in your Dockerfile.
Docker provides a command to visualize our Docker images. As you’ll see in the output below:
- We have no control over the size of our base image, other than changing base image.
This is the “” layer at the bottom of the list.
- Some keywords cost us nothing. Examples include CMD, USER, WORKDIR, etc.
$ docker history my-docker-phpIMAGE CREATED CREATED BY SIZE COMMENTb4e7e4004eeb 4 seconds ago /bin/sh -c #(nop) CMD ["vim"] 0 Bd2a8ad35f9f4 4 seconds ago /bin/sh -c echo 0 B6fc559885751 36 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 38.37 MBf50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Note: If your command makes no changes to the file-system (Like our RUN echo “Building …”), a layer is still created. It just has a zero-byte size.
So in-order to keep our images micro, we need to keep the output of our layers to a minimum
1. File Ownership & Permissions
Never, and I mean it, never change the ownership or permissions of a file inside a Dockerfile unless you absolutely NEED to. When you need to, try to modify as few files as possible.
Although comparisons can be made, Docker isn’t like Git. It doesn’t know what changes have happened inside your layer, only which files are affected. This will cause Docker to create a new layer, replicating/replacing the files. This can cause your image to double in size if you’re modifying particularly large files, or worse, every file!
# DockerfileFROM debian:jessieADD large_file /var/wwwlarge_fileRUN chown www-data /var/www/large_fileRUN chmod 756 /var/www/large_file
$ docker build -t gotcha-1 ....$ docker images gotcha-1REPOSITORY TAG IMAGE ID CREATED SIZEgotcha-1 latest 49b4a4ea228a About a minute ago 3.346 GB$ docker history gotcha-1IMAGE CREATED CREATED BY SIZE COMMENT49b4a4ea228a 36 seconds ago /bin/sh -c chmod 756 /var/www/large_file 1.074 GB09d77316932b 2 minutes ago /bin/sh -c chown www-data /var/www/large_file 1.074 GB7adb7c72c3ef 2 minutes ago /bin/sh -c #(nop) ADD file:a86f6dedfb4ba54972 1.074 GBf50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Tip: If you’re having problems with permissions inside your container, modify them using your entrypoint script, or modify the user id to reflect what you need. Do not modify the files.
Changing the user-id of
www-datato match yours. Tweak as necessary:
RUN usermod -u 1000 www-data
Or run your container with an entrypoint script:
$ cat my-script#!/bin/bashchown www-data -R /var/www/apache2$ docker run my-debian-php --entrypoint=/bin/my-script
2. Clean up after untidy commands
Sometimes other commands leave a trail of garbage at their sides and couldn’t care about the size of your images. We accept this on our desktops and preach “cache” and “performance”. Inside our images, it’s just pure filth.
# DockerfileFROM debian:jessieRUN DEBIAN_FRONTEND=noninteractive apt-get updateRUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
$ docker build -t debian ....$ docker history debianIMAGE CREATED CREATED BY SIZE COMMENTae5a25410c0d 10 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MBaaf5660234d3 21 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 9.694 MBf50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
As you can see from the output above, our
apt-get updatecosts us about 10MB and out
apt-get installcosts us about 30MB. Obviously these are trivial examples, but in larger builds this space will accumulate!
First, let's examine and see what each command is doing to our image. To do this, create an interactive Docker image and bash in:
$ docker run -ti --rm --name live debian:jessie bash
You’ll be live inside the innards of a Debian container and at a bash prompt. Next, let’s get a second terminal window open and inspect the container:
$ docker diff live $
No output. That’s good, because we’ve not done anything yet.
docker diffallows us to see what’s changed inside our container. So lets run our first command:
Note: “$ ” is my local prompt and “root@4552beab7001:/#” is inside the container.
root@4552beab7001:/# apt-get update
$ docker diff liveC /varC /var/libC /var/lib/aptC /var/lib/apt/listsA /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_main_binary-amd64_Packages.gzA /var/lib/apt/lists/security.debian.org_dists_jessie_updates_main_binary-amd64_Packages.gzA /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_main_binary-amd64_Packages.gzA /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_ReleaseA /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release.gpgA /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_InReleaseA /var/lib/apt/lists/lockA /var/lib/apt/lists/security.debian.org_dists_jessie_updates_InRelease
Oooh, we’ve just discovered where our 10MB is going. Lets fix it by tweaking our Dockerfile to delete our
aptcache after installing vim. Your initial thought may be to tweak as:
# DockerfileFROM debian:jessieRUN DEBIAN_FRONTEND=noninteractive apt-get updateRUN DEBIAN_FRONTEND=noninteractive apt-get install -y vimRUN rm -rf /var/lib/apt
Unfortunately, this will only add another layer and not affect the previous layers. So although we’re deleting files, the previous layer still knows them. The common trick is to chain our commands at the shell level. This way, the files don’t exist when the RUN is finished, and they never exist in our history.
# DockerfileFROM debian:jessieRUN DEBIAN_FRONTEND=noninteractive apt-get update \&& apt-get install -y vim \&& rm -rf /var/lib/apt
$ docker history debianIMAGE CREATED CREATED BY SIZE COMMENTbe6afc32bd37 5 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MBf50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Much better 🙂 You can repeat that process for every
the fat out of your image.
RUNinside your Dockerfile and really cut
the fat out of your image.
Create and maintain your own base images, preferably on Alpine! Alpine Linux (http://alpinelinux.org/) is tiny (Under 5MB!) and has a really strong package manager. If you can, use it and keep your base images lean.
Why is creating / maintaining your own base image ideal? Most “official” images are quite bloated and try to be as general as possible. You know what you need. It’s like compiling your own kernel, only not as dangerous 😀
ONBUILD. Use it. When crafting base images,
ONBUILDgives you a great way to reuse this image for both development and production.
ONBUILDtells Docker that when the image is used as a base, we should perform some extra instructions, such as the following, which puts our code into the container for a production build.
ONBUILD ADD . /var/www
As this only runs when being used as a base, our
docker-compose.yml, used for development, can instead mount a volume into the container, for getting our code changes into the container without a rebuild 🙂
services:application:image: my-basevolumes:- .:/var/www
Be careful using community images. They disappear. Often. Fork and maintain your own if it’s mission critical. You’re also putting your trust in the maintainer to protect your attach surface, but that’s a security issue and another post for next time.