Adventures With Docker Multistage Build

I thought damm a lot of users must have uploaded content only to find out that the docker images were taking up all the space,

This week I dived into the fascinating world of multi-stage docker build

The Problem

In https://bfportal.gg i use a relatively simple stack,

#wagtail (a cms that runs on #django ) + #docker + #tailwind .

Then one day I woke up to an alert on #datadog that almost all of the disk space on the VPS has been used,

so I did an image prune and thought the size of the images must be quite big to fill up all the space

lo and behold it was 2 gigs for each image 💀 ( no wonder my poor VPS was all full )

so I went on looking for how to reduce the size of the image.
it was docker multi-stage builds,

At this point i had this good ol’ dockerfile ( Click to expand )

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Use an official Python runtime based on Debian 10 "buster" as a parent image.
FROM python:3.11-slim-buster as dev

# Add user that will be used in the container.
RUN useradd --create-home wagtail

# Port used by this container to serve HTTP.
EXPOSE 8000

# Set environment variables.
# 1. Force Python stdout and stderr streams to be unbuffered.
# 2. Set PORT variable that is used by Gunicorn. This should match "EXPOSE"
#    command.
ENV PYTHONUNBUFFERED=1 \
    PORT=8000 \
    PYTHONDONTWRITEBYTECODE=1 \
    PATH="${PATH}:/home/wagtail/.local/bin" \
    USER="wagtail"

# Install system packages required by Wagtail and Django.
RUN apt-get update --yes --quiet && apt-get install --yes --quiet --no-install-recommends \
    build-essential \
    curl \
    libpq-dev \
    libmariadbclient-dev \
    libjpeg62-turbo-dev \
    zlib1g-dev \
    libwebp-dev \
    software-properties-common \
    zip \
    unzip \
    git \
    npm \
    && rm -rf /var/lib/apt/lists/* /usr/share/doc /usr/share/man \
    && apt-get clean

RUN npm install [email protected] -g && \
    npm install n -g && \
    n 18.8.0

# Install the application server.
RUN pip install "gunicorn==20.0.4"

# Install the project requirements.
COPY requirements.txt /
RUN pip install -r /requirements.txt

# Use /app folder as a directory where the source code is stored.
WORKDIR /app

# Set this directory to be owned by the "wagtail" user. This Wagtail project
# uses SQLite, the folder needs to be owned by the user that
# will be writing to the database file.
RUN chown wagtail:wagtail /app

# Copy the source code of the project into the container.
COPY --chown=wagtail:wagtail . .

# Use user "wagtail" to run the build commands below and the server itself.
USER wagtail

FROM dev as final

RUN python manage.py tailwind install --no-input;
RUN python manage.py tailwind build --no-input
RUN python manage.py collectstatic --noinput --clear

Ugly rht ? 🥲
Its basically this

The Solution

So i started with separating the tools from dependencies, there were two clear things that i can separate,

Python Virtual Environment
Node binary

So i made two separate images for each

Python

In this project i now use poetry, so if you want a reliable way to install poetry in a docker image, here you go

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
FROM python:3.11-buster as builder
RUN pip install poetry==1.5.1
ENV POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_IN_PROJECT=1 \
    POETRY_VIRTUALENVS_CREATE=1

WORKDIR /venv
RUN apt-get update --yes --quiet && apt-get install --yes --quiet --no-install-recommends git
RUN touch README.md

COPY ["pyproject.toml", "poetry.lock", "./"]
RUN poetry config installer.max-workers 10
RUN poetry install --without dev --no-root --no-cache

gave it the name builder so that i can later take advantage of COPY --from=, next up was

Node

16
17
18
FROM node:latest as node_base
RUN echo "NODE Version:" && node --version
RUN echo "NPM Version:" && npm --version

( PS: why is it so hard to install node in debian images ? making a separate image is sooo much easier )

Now that we have our base images ready we can take advantage of multistage builds

The docker image is then split into two parts :

Dev is for local development and contains npm and tailwind binaries
Final is for production an i remove the node_modules folder from it to save space

( Previously we only had final :x )

dev

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
FROM python:3.11-slim-buster as dev

WORKDIR /app
RUN useradd --create-home wagtail

# Port used by this container to serve HTTP.
EXPOSE 8000
# Install system packages required by Wagtail and Django.
RUN apt-get update --yes --quiet && apt-get install --yes --quiet --no-install-recommends \
    build-essential \
    libpq-dev \
    libmariadbclient-dev \
    libjpeg62-turbo-dev \
    zlib1g-dev \
    libwebp-dev \
    curl \
&& rm -rf /var/lib/apt/lists/*
# Set environment variables.
# 1. Force Python stdout and stderr streams to be unbuffered.
#    command.
ENV PYTHONUNBUFFERED=1 \
    PORT=8000 \
    PYTHONDONTWRITEBYTECODE=1 \
    USER="wagtail" \
    VIRTUAL_ENV=/venv/.venv

ENV PATH="${VIRTUAL_ENV}/bin:${PATH}:/home/wagtail/.local/bin"
COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}


COPY --chown=wagtail:wagtail --from=node_base /usr/local/bin /usr/local/bin
COPY --chown=wagtail:wagtail --from=node_base /usr/local/lib/node_modules/npm /usr/local/lib/node_modules/npm
COPY ./bfportal ./
RUN chown -R wagtail:wagtail /app
USER wagtail
RUN npm install

the important lines are 46 - 51 , in these line we take full advantage by copying only the things we need 😁.

then we finally move onto final stage

Final

61
62
63
64
FROM dev as final

RUN npx tailwindcss -i ./bfportal/static/src/styles.css  -o ./bfportal/static/css/bfportal.css --minify
RUN python manage.py collectstatic --noinput --clear  -i static/src/*

Now by using buildx ( docker buildx build --target=final ) one day and a few hours of head-scratching later I was able to bring the size of the image down to just 600MB 😮 .

To my understanding we did this

now all my CI/CD pipelines for bfportal.gg are super fast
( it was around 7 mins before, now it’s < 3 mins ) all thanks to docker ❤️

Further Optimizations

Make use of python-alpine image to have even smaller final image size
- But Python=>Speed recommends against it
Find better way to write file so that we can have less layers and more caching

Conclusion

Must use buildx for faster build times and better cache
Copy only the final compiled tools from a base image
MUST REMOVE node_modules IN THE END

All in all I find docker to be a awesome technology :) , Have a good day

PS : Let me know if you have any ideas how to make it better :), you can find me at discord or twitter (gala_vs)

The Problem#

The Solution#

Python#

Node#

dev#

Final#

Further Optimizations#

Conclusion#