I thought damm a lot of users must have uploaded content only to find out that the docker images were taking up all the space,
This week I dived into the fascinating world of multi-stage docker build
The Problem
In https://bfportal.gg i use a relatively simple stack,
#wagtail (a cms that runs on #django ) + #docker + #tailwind .
Then one day I woke up to an alert on #datadog that almost all of the disk space on the VPS has been used,
so I did an image prune and thought the size of the images must be quite big to fill up all the space
lo and behold it was 2 gigs for each image 💀 ( no wonder my poor VPS was all full )
so I went on looking for how to reduce the size of the image.
it was docker multi-stage builds,
At this point i had this good ol’ dockerfile ( Click to expand )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Use an official Python runtime based on Debian 10 "buster" as a parent image.
FROM python:3.11-slim-buster as dev
# Add user that will be used in the container.
RUN useradd --create-home wagtail
# Port used by this container to serve HTTP.
EXPOSE 8000
# Set environment variables.
# 1. Force Python stdout and stderr streams to be unbuffered.
# 2. Set PORT variable that is used by Gunicorn. This should match "EXPOSE"
# command.
ENV PYTHONUNBUFFERED=1 \
PORT=8000 \
PYTHONDONTWRITEBYTECODE=1 \
PATH="${PATH}:/home/wagtail/.local/bin" \
USER="wagtail"
# Install system packages required by Wagtail and Django.
RUN apt-get update --yes --quiet && apt-get install --yes --quiet --no-install-recommends \
build-essential \
curl \
libpq-dev \
libmariadbclient-dev \
libjpeg62-turbo-dev \
zlib1g-dev \
libwebp-dev \
software-properties-common \
zip \
unzip \
git \
npm \
&& rm -rf /var/lib/apt/lists/* /usr/share/doc /usr/share/man \
&& apt-get clean
RUN npm install [email protected] -g && \
npm install n -g && \
n 18.8.0
# Install the application server.
RUN pip install "gunicorn==20.0.4"
# Install the project requirements.
COPY requirements.txt /
RUN pip install -r /requirements.txt
# Use /app folder as a directory where the source code is stored.
WORKDIR /app
# Set this directory to be owned by the "wagtail" user. This Wagtail project
# uses SQLite, the folder needs to be owned by the user that
# will be writing to the database file.
RUN chown wagtail:wagtail /app
# Copy the source code of the project into the container.
COPY --chown=wagtail:wagtail . .
# Use user "wagtail" to run the build commands below and the server itself.
USER wagtail
FROM dev as final
RUN python manage.py tailwind install --no-input;
RUN python manage.py tailwind build --no-input
RUN python manage.py collectstatic --noinput --clear
Ugly rht ? 🥲
Its basically this
The Solution
So i started with separating the tools from dependencies, there were two clear things that i can separate,
- Python Virtual Environment
- Node binary
So i made two separate images for each
Python
In this project i now use poetry
, so if you want a reliable way to install poetry in a docker image, here you go
|
|
gave it the name builder
so that i can later take advantage of COPY --from=
, next up was
Node
|
|
( PS: why is it so hard to install node in debian images ? making a separate image is sooo much easier )
Now that we have our base images ready we can take advantage of multistage
builds
The docker image is then split into two parts :
- Dev is for local development and contains npm and tailwind binaries
- Final is for production an i remove the
node_modules
folder from it to save space
( Previously we only had final :x )
dev
|
|
the important lines are 46 - 51
, in these line we take full advantage by copying only the things we need 😁.
then we finally move onto final stage
Final
|
|
Now by using buildx
( docker buildx build --target=final
) one day and a few hours of head-scratching later I was able to bring the size of the image down to just 600MB 😮 .
To my understanding we did this
now all my CI/CD pipelines for bfportal.gg
are super fast
( it was around 7 mins before, now it’s < 3 mins ) all thanks to docker ❤️
Further Optimizations
- Make use of python-alpine image to have even smaller final image size
- But Python=>Speed recommends against it
- Find better way to write file so that we can have less layers and more caching
Conclusion
- Must use
buildx
for faster build times and better cache - Copy only the final compiled tools from a base image
- MUST REMOVE
node_modules
IN THE END
All in all I find docker to be a awesome technology :) , Have a good day
PS : Let me know if you have any ideas how to make it better :), you can find me at discord or twitter (gala_vs)