Managing Scheduled Tasks with Django and Celery

I. Background Introduction

Previously, we had been using k8s cronjobs to manage scheduled tasks. The code related to scheduled tasks was encapsulated into a separate pod and triggered via the cronjob method.

Although this method is simple to operate and has no dependencies on third - party resources (such as Redis), it also has an obvious drawback.

The code for scheduled tasks is separated from the Django code, so many Django functions cannot be used. We can only communicate with the Django server through the API encapsulated by DRF. Sometimes, for a single scheduled task, a lot of APIs need to be encapsulated, and issues such as authentication also need to be considered. It’s quite troublesome. Therefore, in the new project, we plan to adopt a different method to manage scheduled tasks.

Engineers who use both Python and Django are likely to know Celery. It is an excellent asynchronous task framework. The last time I used it was in 2020, and I found that the usage of Celery has changed somewhat in recent years. After searching online, I couldn’t find good Chinese materials, so I decided to write this blog in the hope of providing some help to those who need to query relevant information in the future.

II. Celery Configuration

Before configuring Celery, it needs to be installed with pip install celery. Then, we can start the configuration.

Before officially introducing the configuration, we need to make some assumptions to make the following text more understandable.

We create a Django project using django - admin startproject proj. The Django version should be >= 3.0. After successful creation, we will get the following directory structure:

1
2
3
4
5
6
7
8
proj
├── manage.py
└── proj
├── asgi.py
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py

Those familiar with Django should be very familiar with the above directory tree. The following content is written based on this directory tree, so please keep it in mind.

1. Define a Celery Instance

To define a Celery instance, we need to create a file in the above directory tree: proj/proj/celery.py.
The file name is celery.py, which is in the same directory as settings.py.

The content is as follows. I have written some important information in the form of comments in the code. Please pay attention to check.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import os
from celery import Celery


# This configuration can avoid initializing the django configuration in other tasks.py. Although it is not necessary, it is highly recommended.
os.environ.setdefault(
'DJANGO_SETTINGS_MODULE', 'proj.settings'
)

# This is to get the redis address from the environment variable. Here I use redis as the broker.
REDIS_HOST = os.getenv('REDIS_HOST', 'localhost:6379')
app = Celery(
'proj', # The first parameter gives a name to the celery instance, here it is called proj.
backend='redis://' + REDIS_HOST + '/1',
broker='redis://' + REDIS_HOST + '/0',
)

# You can use this method to configure celery in batches.
# These configurations are sufficient for most scenarios.
# In fact, there are several other ways to configure celery, but I think this method is sufficient for projects that are not very large.
app.conf.update(
task_serializer='json',
accept_content=['json'], # Ignore other content
result_serializer='json',
enable_utc=True,
)

# This line will obtain some celery configurations from the django settings file.
# The namespace being equal to CELERY means that configurations in settings starting with "CELERY_" will be recognized as celery configurations.
app.config_from_object('django.conf:settings', namespace='CELERY')

# It will automatically discover tasks in all Django apps.
app.autodiscover_tasks()


@app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
```

In addition to the above configuration, there are two other places that need to be configured.
First, the following content needs to be added to **proj/proj/__init__.py**:
```python
from .celery import app as celery_app


__all__ = ('celery_app',)
```
Its function is to automatically load celery when starting Django.

The other is to add celery configurations in django's settings, which is the part mentioned in the code `app.config_from_object('django.conf:settings', namespace='CELERY')`.
```python
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60 # The maximum running time of a single task, in seconds.

2. Record Task Results

When using celery for task scheduling, it is best to record the results of each task for future reference, especially when the task does not run as expected.

The official website recommends using django - celery - results to record task results.

  1. Installation
    pip install django - celery - results
  2. Registration
    django - celery - results is a separate django app, so it needs to be registered in settings.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
        INSTALLED_APPS = (
    ...,
    'django_celery_results',
    )
    ```
    After registration, the database also needs to be migrated with `python manage.py migrate django_celery_results`
    3. Configuration
    django - celery - results is just a package to help automatically store task results. Eventually, the data still needs to be stored somewhere. There are many places where task results can be stored, such as the database, local file system, redis, etc. Here I use the database, and it is also recommended to use the database.
    Add the following configurations in django's setting.py:
    ```python
    CELERY_RESULT_BACKEND = 'django - db' # Use the database as the backend.
    CELERY_CACHE_BACKEND = 'django - cache' # To be honest, I don't know exactly what this cache configuration does, but the official website recommends using this configuration, so I keep it.
    CELERY_CACHE_BACKEND = 'default'
  3. Start
    Note that this command should be run in the first - level proj directory, otherwise an error will be reported, such as a configuration file not found error.
    1
    celery -A backend worker --loglevel=INFO

III. Scheduled Task Configuration

The above introduced how to configure celery. Now that we have celery, how do we manage scheduled tasks? This is where django - celery - beat comes in handy. Its usage is relatively simple.

1. Configure django - celery - beat

  1. Installation
    pip install django - celery - beat
  2. Registration
    Register in django’s settings.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
        INSTALLED_APPS = (
    ...,
    'django_celery_beat',
    )
    ```
    Similarly, after registration, the database needs to be migrated with `python manage.py migrate django_celery_beat`
    3. Start
    **Note that this command should be run in the first - level proj directory, otherwise an error will be reported, such as a configuration file not found error.**
    ```sh
    celery -A proj beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler

It should be noted that many people like to encapsulate scheduled tasks in the code in the form of cronjobs when using django_celery_beat to manage scheduled tasks. However, I prefer to configure them in the database through the Django Admin page.

Because if they are encapsulated in the code, it will be necessary to rewrite the code and deploy it to the environment if you want to modify the scheduled tasks in the future, which is not very user - friendly. And for non - technical personnel, the possibility of configuring scheduled tasks by themselves is almost zero.

2. Set Specific Scheduled Tasks through Django Admin

This part of the content is relatively simple. Just start Django, log in to the Admin page, and create tasks by clicking on the page. It’s not difficult, but it would require a lot of screenshots to write it out, so I don’t really want to do it.

IV. References

  1. First Steps with Django
  2. Task result backend settings