Some tips to deploy Django in kubernetes

I am not going to go into details in this article about how you can deploy Django in kubernetes. I am just going to highlight the main points you should pay attention to when deploying a Django app. I expect you to have prior knowledge about how to deploy an application in kubernetes using Helm. I hope you will still find useful pieces of information in this article.

Deploying the application

  • Always disable the DEBUG mode with DEBUG=False in your settings. That's the case for all Django deployments not matter how you do it.
  • Don't use the the Django dev server to launch your application (that's the python manage.py server command), rely on gunicorn or something equivalent instead (like you normally would).
  • Rely on environment variables to inject configurations into your settings files. You can use django-environ to help you read, validate and parse them.
    • Store secrets into kubernetes secrets. That includes: the SECRET_KEY configuration value, your database connection details, API keys…
    • Store everything else into a ConfigMap managed by Helm.
  • Configure livenessProbe to detect issues with your applications and allow kuberentes to correctly restart the pod if needed.
  • You may want to add a nginx sidecar container to buffer some requests like file uploads. By default, when you deploy Django into kubernetes, the request will hit gunicorn directly. In the case of long file uploads, it means the gunicorn worker that handles this request cannot do anything until the upload is done. This can be a problem and may result in container restarts (because kubernetes cannot check the liveness probe) or request timeouts. A good way to avoid that, is to put a nginx server in front of gunicorn like you would do if you weren't on kuberentes. The sidecar pattern is a common way to do that. Just make sure your service will route traffic to nginx and not to gunicorn. Normally, this can be done by changing the port it must route traffic to to 80.
    • If you use async Django, you should already be good without nginx. Sadly, at this time, the ORM doesn't support async yet so it limits where you can apply this pattern, meaning you probably will need nginx.
    • You could also use gevent workers, but this involves patching the standard library, so I'm not a fan and don't advise it.
    • You may be able to configure a ngnix ingress at cluster level. However, after some tests, I didn't succeed to correctly configure it. So I decided to use a nginx sidecar which is a much easier pattern to deal with.
  • Don't run gunicorn as root in the container to limit the surface of attack.
  • Use an initContainer to run your migrations.
  • Give your containers resource quotas to avoid any of them using too much resources.
  • Put the static files into a bucket or let nginx serve them. See this article.

Configurations

To help you put this into practice, here are some configuration samples.

Nginx sidecar configuration

It's a very standard reverse proxy configuration.

 1 apiVersion: v1
 2 kind: ConfigMap
 3 metadata:
 4 name: backend-api-nginx
 5 data:
 6 api.conf: |
 7     upstream app_server {
 8         # All containers in the same pod are reachable with 127.0.0.1
 9         server 127.0.0.1:{{ .Values.container.port }} fail_timeout=0;
10     }
11 
12 
13     server {
14         listen 80;
15         root /var/www/api/;
16         client_max_body_size 1G;
17 
18         access_log stdout;
19         error_log  stderr;
20 
21         location / {
22             location /static {
23                 add_header Access-Control-Allow-Origin *;
24                 add_header Access-Control-Max-Age 3600;
25                 add_header Access-Control-Expose-Headers Content-Length;
26                 add_header Access-Control-Allow-Headers Range;
27 
28                 if ($request_method = OPTIONS) {
29                     return 204;
30                 }
31 
32                 try_files /$uri @django;
33             }
34 
35             # Dedicated route for nginx health to better understand wher problems come from if needed.
36             location /nghealth {
37                 return 200;
38             }
39 
40             try_files $uri @django;
41         }
42 
43         location @django {
44             proxy_connect_timeout 30;
45             proxy_send_timeout 30;
46             proxy_read_timeout 30;
47             send_timeout 30;
48             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
49             # We have another proxy in front of this one. It will capture traffic
50             # as HTTPS, so we must not set X-Forwarded-Proto here since it's already
51             # set with the proper value.
52             # proxy_set_header X-Forwarded-Proto $schema;
53             proxy_set_header Host $http_host;
54             proxy_redirect off;
55             proxy_pass http://app_server;
56         }
57     }

deployment.yaml

  1 apiVersion: apps/v1
  2 kind: Deployment
  3 metadata:
  4   name: {{ include "chart.fullname" . }}
  5   labels:
  6 {{ include "chart.labels" . | indent 4 }}
  7 spec:
  8   selector:
  9     matchLabels:
 10       app.kubernetes.io/name: {{ include "chart.name" . }}
 11       app.kubernetes.io/instance: {{ .Release.Name }}
 12   template:
 13     metadata:
 14       labels:
 15         app.kubernetes.io/name: {{ include "chart.name" . }}
 16         app.kubernetes.io/instance: {{ .Release.Name }}
 17     spec:
 18       containers:
 19         - name: {{ .Chart.Name }}
 20           image: "{{ .Values.container.image.repository }}:{{ .Values.container.image.tag }}"
 21           imagePullPolicy: {{ .Values.container.image.pullPolicy }}
 22           securityContext:
 23             privileged: false
 24             runAsUser: 1001
 25             runAsGroup: 1001
 26             # Required to prevent escalations to root.
 27             allowPrivilegeEscalation: false
 28             runAsNonRoot: true
 29           envFrom:
 30           - configMapRef:
 31               name: {{ .Chart.Name }}
 32               optional: true
 33           - secretRef:
 34               name: {{ .Chart.Name }}
 35               optional: true
 36           ports:
 37             - name: http
 38               containerPort: {{ .Values.container.port }}
 39               protocol: TCP
 40           resources:
 41             limits:
 42               memory: {{ .Values.container.resources.limits.memory }}
 43               cpu: {{ .Values.container.resources.limits.cpu }}
 44             requests:
 45               memory: {{ .Values.container.resources.requests.memory }}
 46               cpu: {{ .Values.container.resources.requests.cpu }}
 47           {{ if .Values.container.probe.enabled -}}
 48           # As soon as this container is alive, it can serve traffic, so no need for a readinessProbe.
 49           # We still need a bit for it to start before trying to consider it alive: gunicorn must
 50           # start its workers and open connections to the database.
 51           livenessProbe:
 52             httpGet:
 53               path: {{ .Values.container.probe.path }}
 54               port: {{ .Values.container.port }}
 55             timeoutSeconds: {{ .Values.container.probe.livenessTimeOut }}
 56             initialDelaySeconds: {{ .Values.container.probe.initialDelaySeconds }}
 57           {{- end }}
 58         - name: nginx-sidecar
 59           image: nginx:stable
 60           imagePullPolicy: Always
 61           securityContext:
 62             privileged: false
 63             # Nginx must start as root to bind the proper port in the container.
 64             allowPrivilegeEscalation: true
 65             runAsNonRoot: false
 66           ports:
 67             - name: http
 68               containerPort: {{ .Values.service.port }}
 69               protocol: TCP
 70           volumeMounts:
 71             - name: nginx-conf
 72               mountPath: /etc/nginx/conf.d
 73               readOnly: true
 74             - name: staticfiles
 75               mountPath: /var/www/api/
 76               readOnly: true
 77           {{ if .Values.sidecar.nginx.probe.enabled -}}
 78           livenessProbe:
 79             httpGet:
 80               # When we can access this route, nginx is alive, but it is not ready (ie cannot serve
 81               # traffic yet).
 82               path: {{ .Values.sidecar.nginx.probe.path }}
 83               port: {{ .Values.service.port }}
 84             timeoutSeconds: {{ .Values.sidecar.nginx.probe.livenessTimeOut }}
 85           readinessProbe:
 86             httpGet:
 87               # The container cannot be ready (that is accepting traffic) until it can talk to the
 88               # container. So we need to pass through nginx (with the port) to the container (with
 89               # the path) to check this.
 90               # Since it can take a few seconds, we have an initialDelaySeconds.
 91               path: {{ .Values.container.probe.path }}
 92               port: {{ .Values.service.port }}
 93             initialDelaySeconds: {{ .Values.sidecar.nginx.probe.initialDelaySeconds }}
 94             timeoutSeconds: {{ .Values.sidecar.nginx.probe.livenessTimeOut }}
 95           {{- end }}
 96           resources:
 97             limits:
 98               memory: {{ .Values.container.resources.limits.memory }}
 99               cpu: {{ .Values.container.resources.limits.cpu }}
100             requests:
101               memory: {{ .Values.initContainer.resources.requests.memory }}
102               cpu: {{ .Values.initContainer.resources.requests.cpu }}
103       {{ if .Values.initContainer.enabled -}}
104       initContainers:
105         - name: {{ .Values.initContainer.name }}
106           image: "{{  .Values.container.image.repository }}:{{ .Values.container.image.tag }}"
107           imagePullPolicy: {{ .Values.container.image.pullPolicy }}
108           envFrom:
109             - configMapRef:
110                 name: {{ .Chart.Name }}
111                 optional: true
112             - secretRef:
113                 name: {{ .Chart.Name }}
114                 optional: true
115           resources:
116             limits:
117               memory: {{ .Values.initContainer.resources.limits.memory }}
118               cpu: {{ .Values.initContainer.resources.limits.cpu }}
119             requests:
120               memory: {{ .Values.initContainer.resources.requests.memory }}
121               cpu: {{ .Values.initContainer.resources.requests.cpu }}
122       {{- end }}
123       volumes:
124         - name: nginx-conf
125           configMap:
126             name: backend-api-nginx
127         - name: backend-credentials
128           secret:
129             secretName: {{ .Values.gcp.backend.credentials.secret }}

Handling commands

You can run commands at regular intervals with CronJob. To avoid the need to create one file per CronJob, you can loop over values as described here. In a nutshell, you can combine this cronjobs.yaml Helm template:

{- range $job, $val := .Values.cronjobs }}
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: "{{ .name }}"
spec:
schedule: "{{ .schedule }}"
jobTemplate:
    spec:
    template:
        spec:
        containers:
        - name: "{{ .name }}"
            image: "{{ $.Values.container.image.repository }}:{{ $.Values.container.image.tag }}"
            imagePullPolicy: "{{ $.Values.container.image.pullPolicy }}"
            args:
            - python
            - manage.py
            - "{{ .djangoCommand }}"
            envFrom:
            - configMapRef:
                name: {{ $.Chart.Name }}
                optional: true
            - secretRef:
                name: {{ $.Chart.Name }}
                optional: true
        restartPolicy: "{{ .restartPolicy }}"
---
{{- end}}

With this configuration:

# We currently assume we run the API Python/Django image for all jobs.
cronjobs:
    "0":
        name: backend-api-clearsessions
        # This must be in the standard Unix crontab format
        schedule: "0 23 * * *"
        djangoCommand: clearsessions
        restartPolicy: Never
    "1":
        name: backend-api-clean-pending-loan-applications
        schedule: "0 23 1 * *"
        djangoCommand: remove_stale_contenttypes
        restartPolicy: Never

To create two CronJob in kubernetes: one for python manage.py clearsessions launched every day at 23:00 and one for python manage.py remove_stale_contenttypes launched every fist day of each month at 23:00.


Pages

blogroll

social