Some tips for django

Posted on 2018-05-21 in Trucs et astuces

View last executed query

Run (the query will be as executed by the database with values correctly replaced and escaped):

from django.db import connection
print(connection.queries[-1])

Logging

Queries

You can configure your logger to view all requests made to the database with:

LOGGING = {
    # ...
    'loggers': {
        # ...
        'django.db': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
    },
}

Source: Surviving Django (if you care about databases) under Another random bit of advice.

Migrations

Checks

You can use this check to verify your migrations during CI (no dependency issue and no creation):

python manage.py makemigrations
if [[ $(git status --porcelain | grep migrations | wc -l) -gt 0 ]]; then
    echo 'New migrations were created in the project. Please fix that.' >&2
    exit 1
fi

If you want to use it in a git hook, you should only consider untracked migrations:

python manage.py migrate
migs_exit_code=$?

if [[ "${migs_exit_code}" -ne 0 ]]; then
    exit 1
fi

if [[ $(git status --porcelain | grep migrations | grep .py | grep '??' | wc -l) -gt 0 ]]; then
    echo 'New migrations were created in the project. Please fix that.' >&2
    exit 1
fi

Or you can just use python manage.py makemigrations --dry-run --check which I discovered recently.

Fake all pending migrations

#!/usr/bin/env bash

set -eu
set -o pipefail

# This will contain a list like this:
#wagtailusers
# [ ] 0001_initial
# [ ] 0002_add_verbose_name_on_userprofile
# [ ] 0003_add_verbose_names
# [ ] 0004_capitalizeverbose
# [ ] 0005_make_related_name_wagtail_specific
# [ ] 0006_userprofile_prefered_language
# [ ] 0007_userprofile_current_time_zone
# [ ] 0008_userprofile_avatar
# [ ] 0009_userprofile_verbose_name_plural

pending_migrations="$(python manage.py showmigrations | grep --color '\[ \]\|^[a-z]' | grep --color '[  ]' -B 1)"

declare -A app_to_last_migration

# Make sure we loop over lines, not words.
IFS=$'\n'
# Array must not be quoted to correctly loop over each line.
# shellcheck disable=SC2068
for line in ${pending_migrations[@]}; do
    if [[ "$line" =~ ^[a-z]+ ]]; then
        app_name="${line}"
    elif [[ "$line" =~ \[ ]]; then
        # Capture the last migration of the app to pass it and all migrations before it.
        migration_name=$(echo "${line}" | awk '{print $3}')
        app_to_last_migration["${app_name}"]="${migration_name}"
    fi
done

for app_name in "${!app_to_last_migration[@]}"; do
    python manage.py migrate ${app_name} ${app_to_last_migration[${app_name}]} --fake
done

Model proxies

They are useful to change the behavior of a model (based on a type column for instance) or to work on a subset of a table (based on a type column for instance). See the documentation for more details.

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

class MyPerson(Person):
    class Meta:
        proxy = True

    def do_something(self):
        # ...
        pass

class OrderedPerson(Person):
    class Meta:
        ordering = ["last_name"]
        proxy = True

Easily increment migrations numbers

Here is a small bash script to increment the numbers of your migrations. Put it into your ~/.profile or ~/.bashrc. Use it like: inc-migrations PROJECT/apps/APP_NAME/migrations 0012

It will rename all migrations from 0012 (ie transform 0012_mig.py into 0013_mig.py and so on) and replace all occurrences of the previous name (eg 0012_mig.py) with the new one (eg 0013_mig.py).

function inc-migrations() {
    local folder="$1"
    local number="$2"
    local start_file
    local new_number
    local next_file

    if ! [[ "$(pwd)" =~ "${folder}$" ]]; then
        cd "${folder}"
    fi

    if [[ -f "${number}" ]]; then
        start_file="${number}"
        number=$(echo "${number}" | cut -d _ -f 1)
    else
        start_file=$(ls ${number}* 2> /dev/null)
    fi

    if [[ ! -f "${start_file}" ]]; then
        echo "${start_file} doesn't exits" >&2
        return 1
    fi
    let "new_number = ${number} + 1"
    new_number=$(printf "%04d\n" "${new_number}")
    next_file=$(ls ${new_number}* 2> /dev/null)

    new_file_name="${start_file/${number}/${new_number}}"
    git mv "${start_file}" "${new_file_name}"
    sed -i "s/${start_file/.py/}/${new_file_name/.py/}/g" *.py

    if [[ -f "${next_file:-}" ]]; then
        inc-migrations "${folder}" "${next_file}"
    fi
}

Create a widget with custom display

class BonusTimeWidget(AdminIntegerFieldWidget):
    UNITS_TO_TRANSLATIONS = {
        'month': partial(ungettext_lazy, '%(count)d month', '%(count)d free months'),
    }

    def __init__(self, *args, unit='month', **kwargs):
        super().__init__(*args, **kwargs)
        if unit not in self.UNITS_TO_TRANSLATIONS:
            supported_units = ','.join(self.UNITS_TO_TRANSLATIONS.keys())
            raise ValueError(
                f'{unit} is not a supported unit. Supported units are: {supported_units}',
            )

        self.bonus_translation = self.UNITS_TO_TRANSLATIONS[unit]

    def render(self, name, value, attrs=None):
        """Render the value in a custom span."""
        if value == 0:
            return ''

        text = self.bonus_translation(value) % {'count': value}
        return mark_safe(f'<span>+ {text}</span>')

Use a nginx reverse proxy in dev

The django web server works well but can be slow when it needs to handle many requests (to load many images for instance). One way to solve this is to use a production web server (nginx in this case) to handle most of the work (ie everything but dynamically generated pages).

Prerequisites:

  1. Install nginx
  2. Add the domain you want to use in your /etc/hosts file. For instance 127.0.0.1 myproject.localhost.
  3. Make sure nginx has access to the files of your project. Most of the time a chmod 755 /PATH/TO/PROJECT will do (repeat on each subdirectory nginx need to pass to access to your files). If you are using a shared computer, you may need to think about a more secure way to allow nginx to access the file (ACL may help you).
  4. If you are using SELinux, don't forget to add the proper context to the files. For instance, do something like:
    1. Add these files to the proper SELinux context by copying the one from the default web folder: semanage fcontext --add --equal /var/www/html /PATH/TO/PROJECT
    2. Restore the context of the files: restorecon -R /home/jenselme/Work/bureauxlocaux
    3. Check that the context is correct: ls -Z The output should contain something like system_u:object_r:httpd_sys_content_t:s0.

Here is the nginx configuration to put in /etc/nginx/conf.d (or /etc/nginx/sites-enabled):

server {
    listen 80;
    # Make sure this host is in the ALLOWED_HOSTS variable in the settings.
    server_name PROJECT.localhost;
    root /PATH/TO/PROJECT;

    # Prevent access to pyc and py files.
    location ~ .*\.pyc? {
        return 404;
    }

    # Search for files in the media folder. Change this if you configured Django to store your uploaded files elsewhere.
    location ~ ^/files/(.*) {
        try_files /media/$1 =404;
    }

    # Look for static files in the static folder or at the root of the project.
    location ~ ^/static {
        # Look both in the production static folder at the root of you project and in PROJECT
        # (where you have the apps directory and the static directory you use in dev).
        try_files /PROJECT/$uri /$uri $uri;
    }

    # Relay everything else to the django web server.
    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://127.0.0.1:8000;
    }
}

To add a connect timeout (eg to mimic Heroku's timeout) to the Django dev server, add the lines below in the @django block:

proxy_connect_timeout   30;
proxy_send_timeout      30;
proxy_read_timeout      30;
send_timeout            30;

Checking PO files

#!/usr/bin/env bash

set -eu

for file in "$@"; do
  msgfmt -v --check "${file}"
done

Testing

Inserting pytest fixtures in a test case

class SpamTest(unittest.TestCase):

    @pytest.fixture(autouse=True)
    def inject_fixtures(self, caplog):
        self._caplog = caplog

    def test_eggs(self):
        with self._caplog.at_level(logging.INFO):
            spam.eggs()
            assert self._caplog.records[0].message == 'bacon'

Source: https://stackoverflow.com/a/50375022

Reset PK sequence

def reset_database_sequences(*models_or_factories):
    models = [
        model_or_factory._meta.model for model_or_factory in models_or_factories
    ]
    sequence_sql = connection.ops.sequence_reset_sql(no_style(), models)
    with connection.cursor() as cursor:
        for sql in sequence_sql:
            cursor.execute(sql)

Source: https://stackoverflow.com/a/50275965/3900519

Reset factoryboy sequence

def reset_sequences(*factories):
    reset_database_sequences(*factories)
    for factory_cls in factories:
        factory_cls.reset_sequence()

View inter app dependencies

Here is a small script to help you picture the dependencies between your Django apps. It does three things:

  • It creates a graph of your models thanks to Django extensions.
  • It creates a TSV file with all your URLs.
  • It creates a text file with the dependencies between each app. This file is very bare bone and will parse import of your Python files thanks to a regexp. I didn't manage to make more evolved tools like pydeps to just print a graph between each app, I always got to much information. You can make sure CI fails when new dependencies are introduced so you can review them. For that, pass the ci argument to the script.
 1 #!/usr/bin/env bash
 2 
 3 set -eu
 4 set -o pipefail
 5 
 6 # This scripts requires extra deps (either pygraphviz or pyparsing and pydot).
 7 # See: https://django-extensions.readthedocs.io/en/latest/graph_models.html
 8 
 9 readonly my_project="my_project"
10 
11 # We ignore the history app as it mirrors other app. It doesn't have dependencies by itself.
12 readonly our_apps=$(ls ${my_project}/apps/ | grep -vE '__(\.py)?$' | grep -v history)
13 
14 function does_app_depends_on_app() {
15     local app="$1"
16     local inner_app="$2"
17 
18     # We allow constants import across apps so they can be defined where relevant.
19     # Since constants don't import stuff from the project, these imports will never be an issue anyway.
20     # To do that, we impose the imports to match the import of the app AND that we don't import constants.
21     # See https://stackoverflow.com/a/6361362 for symbols details.
22     # We can add # ignore-deps at the end of a line to ignore a manually validated deps.
23     grep -RP --files-with-matches "(?=^(from|import).*${inner_app}( |\.|\n))(?=(?!((from|import).*${inner_app}.constants( |\.|\n)|from.*${inner_app} import constants)))(?=(?!.*# ignore-deps\$))" "${my_project}/apps/${app}" |
24         grep -v __pycache__ |
25         grep -v __test__ |
26         grep -v __tests__ |
27         grep -v "^${my_project}/apps/${app}/migrations/" |
28         grep -v "^${my_project}/apps/${app}/admin/" |
29         grep -v --quiet "^${my_project}/apps/${app}/factories/"
30 }
31 
32 function compute_model_graph() {
33     echo "Computing model graph"
34     python manage.py graph_models \
35         --no-inheritance \
36         --exclude-models Group,Permission,AbstractUser \
37         --group-models \
38         --disable-fields \
39         --output ${my_project}_models.svg \
40         --output ${my_project}_models.png \
41         ${our_apps}
42     echo -e "\n"
43 }
44 
45 function compute_urls_list() {
46     echo "Computing URLs list"
47     echo -e "URL\tview\treverse_name" > urls.tsv
48     python manage.py show_urls --force-color |
49         grep -v '^/__debug__' |  # Django Debug Toolbar.
50         grep -v 'admin:[a-z_]+$' | # Admin views generated by Django.
51         grep -v 'django.views.static.serve$' | # View generated by Django to serve static content.
52         grep -v '^/ckeditor'  | # View for CKEditor (external app)
53         grep -v '^/bk-team' >> urls.tsv  # Extra admin views generated by Django (mostly redirection).
54     echo -e "\n"
55 }
56 
57 function compute_app_dependencies() {
58     echo "Computing app dependencies"
59     for app in ${our_apps}; do
60         echo "${app} depends on these apps"
61         for inner_app in ${our_apps}; do
62             if [[ "${inner_app}" != "${app}" ]]; then
63                 if does_app_depends_on_app "${app}" "${inner_app}"; then
64                     echo -e "\t${inner_app}"
65                 fi
66             fi
67         done
68         echo -e "\n"
69     done
70 }
71 
72 function main() {
73     case $1 in
74         ci)
75             compute_app_dependencies > ./scripts/app_deps.txt
76             git diff --quiet --exit-code
77             ;;
78         *)
79             compute_model_graph
80             compute_urls_list
81             compute_app_dependencies
82             ;;
83     esac
84 }
85 
86 main "$@"

Use the true client IP

Sometimes, you will need to know the IP of the client. To track where the user connected from as a security feature or to block an IP when it fails to connect too many times. If you don't have any reverse proxy, you can simply use request.META["REMOTE_ADDR"] to get the information. If you do, request.META["REMOTE_ADDR"] will contain the address of the proxy, not the client.

Luckily, if your proxy is correctly configured, it will append the address of its client to the X-Forwarded-For header. For nginx, this is done with proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;. You can then read request.META["HTTP_X_FORWARDED_FOR"] to get the IP or set request.META["REMOTE_ADDR"] to the proper value (some libriaries like Django user sessions can only read the IP from there). A naive and insecure way, would be do to it in a middleware like this:

class IpAddressMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # Beware, this IP will generally NOT be trust worthy. It can be tampered
        # with by the user by setting the header manually on a request made
        # directly to the backend. In these case, we will get the user
        # supplied address.
        if request.META.get("HTTP_X_FORWARDED_FOR"):
            request.META["REMOTE_ADDR"] = (
                request.META["HTTP_X_FORWARDED_FOR"].split(",")[0].strip()
            )
        return self.get_response(request)

The problem being, the IP cannot be trusted. If all goes well and the user is trustworthy, you will get the IP. But the user can spoof the header with something like this:

curl -H 'X-Forwarded-For: SPOOFED_IP' https://example.com

After the passage by the proxy, the header will look like this:

X-Forwarded-For: SPOOFED_IP, CLIENT_IP

So you will read the spoofed address. Since the address of the proxy will be appended at the end, you may think that instead of reading the first address, all you need to do is to read the last one. If you have exactly one proxy it works. If you have more, the header will be like CLIENT_IP, PROXY1_IP. So it doesn't work.

Instead, what we need to do is add a configuration settings like REVERSE_PROXY_COUNT, set it to the proper number of proxies and use it to find the proper address. If you take into account health probes that may not go down the all proxies stack, you can end-up with something like this:

class IpAddressMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.META.get("HTTP_X_FORWARDED_FOR"):
            x_forwarded_for = [
                ip.strip()
                for ip in request.META["HTTP_X_FORWARDED_FOR"].split(",")
            ]
            # Only probes can genuinely have this. Their requests are the
            # only one that won't go through load balancing and only
            # through nginx. So they are the only one with one proxy
            # and so with x_forwarded_for at 1. And they have a dedicated
            # user agent we can check for extra safety.
            is_probe = len(x_forwarded_for) == 1 and (
                request.headers.get("User-Agent", "").startswith("kube-probe/")
                or request.headers.get("User-Agent", "").startswith("GoogleHC/")
            )
            if len(x_forwarded_for) != settings.BK_PROXY_COUNT and not is_probe:
                logger.error(
                    f"Expected {settings.BK_PROXY_COUNT} addresses in "
                    f"X-Forwarded-For, got {len(x_forwarded_for)}. It can either"
                    f"be a configuration issue or an attack. Please check."
                )

            if len(x_forwarded_for) <= settings.BK_PROXY_COUNT:
                remote_addr = x_forwarded_for[0]
            else:
                # We have some user supplied data in the header.
                # Let's strip it.
                client_ip_index = len(x_forwarded_for) - settings.BK_PROXY_COUNT
                remote_addr = x_forwarded_for[client_ip_index]

            logger.debug(
                f"Setting REMOTE_ADDR based on value from X-Forwarded-For header. "
                f"Changing from {request.META['REMOTE_ADDR']} to {remote_addr}
                f"based on {x_forwarded_for}"
            )
            request.META["REMOTE_ADDR"] = remote_addr
        return self.get_response(request)

Note

This only works if you have more than 1 proxy. If you have only one proxy, you will need to rely on IPs to identify the probes if you need to. Or make sure they don't go through the proxy so X-Forwarded-For won't be set.