Some tips to deploy Django in kubernetes

Posted on 2021-03-29 in Trucs et astuces Last modified on: 2022-03-09

I am not going to go into details in this article about how you can deploy Django in kubernetes. I am just going to highlight the main points you should pay attention to when deploying a Django app. I expect you to have prior knowledge about how to deploy an application in kubernetes using Helm. I hope you will still find useful pieces of information in this article.

Contents

Deploying the application
- Configurations
Handling commands
History

Deploying the application

Always disable the DEBUG mode with DEBUG=False in your settings. That's the case for all Django deployments not matter how you do it.
Don't use the the Django dev server to launch your application (that's the python manage.py server command), rely on gunicorn or something equivalent instead (like you normally would).
Rely on environment variables to inject configurations into your settings files. You can use django-environ to help you read, validate and parse them.
- Store secrets into kubernetes secrets. That includes: the SECRET_KEY configuration value, your database connection details, API keys…
- Store everything else into a ConfigMap managed by Helm.
Configure livenessProbe to detect issues with your applications and allow kubernetes to correctly restart the pod if needed.
You may want to add a nginx sidecar container to buffer some requests like file uploads. By default, when you deploy Django into kubernetes, the request will hit gunicorn directly. In the case of long file uploads, it means the gunicorn worker that handles this request cannot do anything until the upload is done. This can be a problem and may result in container restarts (because kubernetes cannot check the liveness probe) or request timeouts. A good way to avoid that, is to put a nginx server in front of gunicorn like you would do if you weren't on kubernetes. The sidecar pattern is a common way to do that. Just make sure your service will route traffic to nginx and not to gunicorn. Normally, this can be done by changing the port it must route traffic to to 80.
- If you use async Django, you should already be good without nginx. Sadly, at this time, the ORM doesn't support async yet so it limits where you can apply this pattern, meaning you probably will need nginx.
- You could also use gevent workers, but this involves patching the standard library, so I'm not a fan and don't advise it.
- You may be able to configure a ngnix ingress at cluster level. However, after some tests, I didn't succeed to correctly configure it. So I decided to use a nginx sidecar which is a much easier pattern to deal with.
Don't run gunicorn as root in the container to limit the surface of attack.
Use an initContainer to run your migrations.
Give your containers resource quotas to avoid any of them using too much resources.
Put the static files into a bucket or let nginx serve them. See this article.

Configurations

To help you put this into practice, here are some configuration samples.

Nginx sidecar configuration

It's a very standard reverse proxy configuration.

 1 apiVersion: v1
 2 kind: ConfigMap
 3 metadata:
 4 name: backend-api-nginx
 5 data:
 6 api.conf: |
 7     upstream app_server {
 8         # All containers in the same pod are reachable with 127.0.0.1
 9         server 127.0.0.1:{{ .Values.container.port }} fail_timeout=0;
10     }
11 
12 
13     server {
14         listen 80;
15         root /var/www/api/;
16         client_max_body_size 1G;
17 
18         access_log stdout;
19         error_log  stderr;
20 
21         location / {
22             location /static {
23                 add_header Access-Control-Allow-Origin *;
24                 add_header Access-Control-Max-Age 3600;
25                 add_header Access-Control-Expose-Headers Content-Length;
26                 add_header Access-Control-Allow-Headers Range;
27 
28                 if ($request_method = OPTIONS) {
29                     return 204;
30                 }
31 
32                 try_files /$uri @django;
33             }
34 
35             # Dedicated route for nginx health to better understand wher problems come from if needed.
36             location /nghealth {
37                 return 200;
38             }
39 
40             try_files $uri @django;
41         }
42 
43         location @django {
44             proxy_connect_timeout 30;
45             proxy_send_timeout 30;
46             proxy_read_timeout 30;
47             send_timeout 30;
48             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
49             # We have another proxy in front of this one. It will capture traffic
50             # as HTTPS, so we must not set X-Forwarded-Proto here since it's already
51             # set with the proper value.
52             # proxy_set_header X-Forwarded-Proto $schema;
53             proxy_set_header Host $http_host;
54             proxy_redirect off;
55             proxy_pass http://app_server;
56         }
57     }

deployment.yaml

  1 apiVersion: apps/v1
  2 kind: Deployment
  3 metadata:
  4   name: {{ include "chart.fullname" . }}
  5   labels:
  6 {{ include "chart.labels" . | indent 4 }}
  7 spec:
  8   selector:
  9     matchLabels:
 10       app.kubernetes.io/name: {{ include "chart.name" . }}
 11       app.kubernetes.io/instance: {{ .Release.Name }}
 12   template:
 13     metadata:
 14       labels:
 15         app.kubernetes.io/name: {{ include "chart.name" . }}
 16         app.kubernetes.io/instance: {{ .Release.Name }}
 17     spec:
 18       containers:
 19         - name: {{ .Chart.Name }}
 20           image: "{{ .Values.container.image.repository }}:{{ .Values.container.image.tag }}"
 21           imagePullPolicy: {{ .Values.container.image.pullPolicy }}
 22           securityContext:
 23             privileged: false
 24             runAsUser: 1001
 25             runAsGroup: 1001
 26             # Required to prevent escalations to root.
 27             allowPrivilegeEscalation: false
 28             runAsNonRoot: true
 29           envFrom:
 30           - configMapRef:
 31               name: {{ .Chart.Name }}
 32               optional: true
 33           - secretRef:
 34               name: {{ .Chart.Name }}
 35               optional: true
 36           ports:
 37             - name: http
 38               containerPort: {{ .Values.container.port }}
 39               protocol: TCP
 40           resources:
 41             limits:
 42               memory: {{ .Values.container.resources.limits.memory }}
 43               cpu: {{ .Values.container.resources.limits.cpu }}
 44             requests:
 45               memory: {{ .Values.container.resources.requests.memory }}
 46               cpu: {{ .Values.container.resources.requests.cpu }}
 47           {{ if .Values.container.probe.enabled -}}
 48           # As soon as this container is alive, it can serve traffic, so no need for a readinessProbe.
 49           # We still need a bit for it to start before trying to consider it alive: gunicorn must
 50           # start its workers and open connections to the database.
 51           livenessProbe:
 52             httpGet:
 53               path: {{ .Values.container.probe.path }}
 54               port: {{ .Values.container.port }}
 55             timeoutSeconds: {{ .Values.container.probe.livenessTimeOut }}
 56             initialDelaySeconds: {{ .Values.container.probe.initialDelaySeconds }}
 57           {{- end }}
 58           volumeMounts:
 59             - name: backend-credentials
 60               mountPath: /secrets/backend
 61               readOnly: true
 62             - name: staticfiles
 63               mountPath: /var/www/api/
 64               # The API must be able to copy the files to the volume.
 65               readOnly: false
 66         - name: nginx-sidecar
 67           image: nginx:stable
 68           imagePullPolicy: Always
 69           securityContext:
 70             privileged: false
 71             # Nginx must start as root to bind the proper port in the container.
 72             allowPrivilegeEscalation: true
 73             runAsNonRoot: false
 74           ports:
 75             - name: http
 76               containerPort: {{ .Values.service.port }}
 77               protocol: TCP
 78           volumeMounts:
 79             - name: nginx-conf
 80               mountPath: /etc/nginx/conf.d
 81               readOnly: true
 82             - name: staticfiles
 83               mountPath: /var/www/api/
 84               readOnly: true
 85           {{ if .Values.sidecar.nginx.probe.enabled -}}
 86           livenessProbe:
 87             httpGet:
 88               # When we can access this route, nginx is alive, but it is not ready (ie cannot serve
 89               # traffic yet).
 90               path: {{ .Values.sidecar.nginx.probe.path }}
 91               port: {{ .Values.service.port }}
 92             timeoutSeconds: {{ .Values.sidecar.nginx.probe.livenessTimeOut }}
 93           readinessProbe:
 94             httpGet:
 95               # The container cannot be ready (that is accepting traffic) until it can talk to the
 96               # container. So we need to pass through nginx (with the port) to the container (with
 97               # the path) to check this.
 98               # Since it can take a few seconds, we have an initialDelaySeconds.
 99               path: {{ .Values.container.probe.path }}
100               port: {{ .Values.service.port }}
101             initialDelaySeconds: {{ .Values.sidecar.nginx.probe.initialDelaySeconds }}
102             timeoutSeconds: {{ .Values.sidecar.nginx.probe.livenessTimeOut }}
103           {{- end }}
104           resources:
105             limits:
106               memory: {{ .Values.container.resources.limits.memory }}
107               cpu: {{ .Values.container.resources.limits.cpu }}
108             requests:
109               memory: {{ .Values.initContainer.resources.requests.memory }}
110               cpu: {{ .Values.initContainer.resources.requests.cpu }}
111       {{ if .Values.initContainer.enabled -}}
112       initContainers:
113         - name: {{ .Values.initContainer.name }}
114           image: "{{  .Values.container.image.repository }}:{{ .Values.container.image.tag }}"
115           imagePullPolicy: {{ .Values.container.image.pullPolicy }}
116           envFrom:
117             - configMapRef:
118                 name: {{ .Chart.Name }}
119                 optional: true
120             - secretRef:
121                 name: {{ .Chart.Name }}
122                 optional: true
123           resources:
124             limits:
125               memory: {{ .Values.initContainer.resources.limits.memory }}
126               cpu: {{ .Values.initContainer.resources.limits.cpu }}
127             requests:
128               memory: {{ .Values.initContainer.resources.requests.memory }}
129               cpu: {{ .Values.initContainer.resources.requests.cpu }}
130       {{- end }}
131       volumes:
132         - name: nginx-conf
133           configMap:
134             name: backend-api-nginx
135         - name: staticfiles
136           emptyDir: {}
137         - name: backend-credentials
138           secret:
139             secretName: {{ .Values.gcp.backend.credentials.secret }}

The Dockerfile and related scripts

 1 # Don't use alpine based images: Python was designed for glibc and is very slow in them.
 2 # Always use the -slim images if you can: they are the best compromise between performance and image size.
 3 FROM python:3.8-slim
 4 ENV PYTHONPATH /code
 5 # This is to print directly to stdout instead of buffering output
 6 ENV PYTHONUNBUFFERED 1
 7 ARG BUILD_RELEASE=undefined
 8 ENV RELEASE=$BUILD_RELEASE
 9 RUN pip install pipenv
10 
11 WORKDIR /code
12 
13 COPY Pipfile ./
14 COPY Pipfile.lock ./
15 COPY scripts/django-entrypoint.sh scripts/django-install.sh scripts/setup-django-run-as-non-root.sh scripts/run-django-production.sh /usr/local/bin/
16 RUN /usr/local/bin/django-install.sh prod
17 RUN pip install dumb-init
18 
19 COPY myapp/ ./myapp/
20 COPY manage.py .
21 COPY pyproject.toml .
22 COPY tox.ini .
23 
24 # Create non-root user and configure it for the project to run correctly with it.
25 RUN /usr/local/bin/setup-django-run-as-non-root.sh
26 
27 ENTRYPOINT ["/usr/local/bin/dumb-init", "--"]
28 CMD ["/usr/local/bin/django-entrypoint.sh", "/usr/local/bin/run-django-production.sh"]

django-install.sh:

 1 #!/usr/bin/env bash
 2 
 3 set -e
 4 set -u
 5 set -o pipefail
 6 
 7 ENV="${1:-prod}"
 8 readonly ENV
 9 
10 echo "Installing deps for env ${ENV}"
11 
12 apt-get update
13 # We must install some deps from git, hence the need to install git.
14 # You may not need this and you may need to install extra libs.
15 apt-get install -y git
16 if [[ "${ENV}" == 'prod' ]]; then
17     pipenv install --system --deploy
18 else
19     pipenv install --dev --system --deploy
20 fi
21 apt-get auto-remove -y git
22 apt-get clean

setup-django-run-as-non-root.sh:

 1 #!/usr/bin/env bash
 2 
 3 set -e
 4 set -u
 5 set -o pipefail
 6 
 7 # In production, we won't start the container as root so we compile the pyc files
 8 # and to prepare the collect static while we can write files.
 9 # We wont be able to do this after once we lost write access to code folders.
10 # uuid cannot be 1000 otherwise chown won't work.
11 # This must match what we use in the securityContext of the pod.
12 groupadd --gid 1001 gunicorn
13 useradd gunicorn --uid 1001 --gid 1001
14 mkdir static
15 chown gunicorn:gunicorn static
16 python -m compileall myapp

django-entrypoint.sh:

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

postgres_ready() {
    python << END
    import sys
    import psycopg2
    try:
        psycopg2.connect(
            dbname="${DB_NAME}",
            user="${DB_USER}",
            password="${DB_PASSWORD}",
            host="${DB_HOST}",
            port="${DB_PORT}",
        )
    except psycopg2.OperationalError:
        sys.exit(-1)
    sys.exit(0)
    END
}
until postgres_ready; do
>&2 echo 'Waiting for PostgreSQL to become available...'
sleep 1
done
>&2 echo 'PostgreSQL is available'

exec "$@"

run-django-production.sh:

#!/usr/bin/env bash

set -o errexit
set -o pipefail
set -o nounset

mkdir -p /var/www/api/
cp -R static /var/www/api/
gunicorn --bind :8000 --workers 1 myapp.wsgi

Handling commands

You can run commands at regular intervals with CronJob. To avoid the need to create one file per CronJob, you can loop over values as described here. In a nutshell, you can combine this cronjobs.yaml Helm template:

{- range $job, $val := .Values.cronjobs }}
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: "{{ .name }}"
spec:
schedule: "{{ .schedule }}"
jobTemplate:
    spec:
    template:
        spec:
        containers:
        - name: "{{ .name }}"
            image: "{{ $.Values.container.image.repository }}:{{ $.Values.container.image.tag }}"
            imagePullPolicy: "{{ $.Values.container.image.pullPolicy }}"
            args:
            - python
            - manage.py
            - "{{ .djangoCommand }}"
            envFrom:
            - configMapRef:
                name: {{ $.Chart.Name }}
                optional: true
            - secretRef:
                name: {{ $.Chart.Name }}
                optional: true
        restartPolicy: "{{ .restartPolicy }}"
---
{{- end}}

With this configuration:

# We currently assume we run the API Python/Django image for all jobs.
cronjobs:
    "0":
        name: backend-api-clearsessions
        # This must be in the standard Unix crontab format
        schedule: "0 23 * * *"
        djangoCommand: clearsessions
        restartPolicy: Never
    "1":
        name: backend-api-clean-pending-loan-applications
        schedule: "0 23 1 * *"
        djangoCommand: remove_stale_contenttypes
        restartPolicy: Never

To create two CronJob in kubernetes: one for python manage.py clearsessions launched every day at 23:00 and one for python manage.py remove_stale_contenttypes launched every fist day of each month at 23:00.

History

9th of March 2022: corrected static files volume mounts and copy of static files to volumes. Thanks to RomoSapiens for the catch.