2015-01-01T23:54:07Z

Using Celery With Flask

The topic of running background tasks is complex, and because of that there is a lot of confusion around it. I have tackled it in my Mega-Tutorial, later in my book, and then again in much more detail in my REST API training video. To keep things simple, in all the examples I have used so far I have executed background tasks in threads, but I always noted that for a more scalable and production ready solution a task queue such as Celery should be used instead.

My readers constantly ask me about Celery, and how a Flask application can use it, so today I am going to show you two examples that I hope will cover most application needs.

What is Celery?

Celery is an asynchronous task queue. You can use it to execute tasks outside of the context of your application. The general idea is that any resource consuming tasks that your application may need to run can be offloaded to the task queue, leaving your application free to respond to client requests.

Running background tasks through Celery is not as trivial as doing so in threads. But the benefits are many, as Celery has a distributed architecture that will enable your application to scale. A Celery installation has three core components:

  1. The Celery client. This is used to issue background jobs. When working with Flask, the client runs with the Flask application.
  2. The Celery workers. These are the processes that run the background jobs. Celery supports local and remote workers, so you can start with a single worker running on the same machine as the Flask server, and later add more workers as the needs of your application grow.
  3. The message broker. The client communicates with the the workers through a message queue, and Celery supports several ways to implement these queues. The most commonly used brokers are RabbitMQ and Redis.

For The Impatient

If you are the instant gratification type, and the screenshot at the top of this article intrigued you, then head over to the Github repository for the code used in this article. The README file there will give you the quick and dirty approach to running and playing with the example application.

Then come back to learn how everything works!

Working with Flask and Celery

The integration of Celery with Flask is so simple that no extension is required. A Flask application that uses Celery needs to initialize the Celery client as follows:

from flask import Flask
from celery import Celery

app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost:6379/0'

celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)

As you can see, Celery is initialized by creating an object of class Celery, and passing the application name and the connection URL for the message broker, which I put in app.config under key CELERY_BROKER_URL. This URL tells Celery where the broker service is running. If you run something other than Redis, or have the broker on a different machine, then you will need to change the URL accordingly.

Any additional configuration options for Celery can be passed directly from Flask's configuration through the celery.conf.update() call. The CELERY_RESULT_BACKEND option is only necessary if you need to have Celery store status and results from tasks. The first example I will show you does not require this functionality, but the second does, so it's best to have it configured from the start.

Any functions that you want to run as background tasks need to be decorated with the celery.task decorator. For example:

@celery.task
def my_background_task(arg1, arg2):
    # some long running task here
    return result

Then the Flask application can request the execution of this background task as follows:

task = my_background_task.delay(10, 20)

The delay() method is a shortcut to the more powerful apply_async() call. Here is the equivalent call using apply_async():

task = my_background_task.apply_async(args=[10, 20])

When using apply_async(), you can give Celery more detailed instructions about how the background task is to be executed. A useful option is to request that the task executes at some point in the future. For example, this invocation will schedule the task to run in about a minute:

task = my_background_task.apply_async(args=[10, 20], countdown=60)

The return value of delay() and apply_async() is an object that represents the task, and this object can be used to obtain status. I will show you how this is done later in this article, but for now let's keep it simple and not worry about results from tasks.

Consult the Celery documentation to learn about many other available options.

Simple Example: Sending Asynchronous Emails

The first example that I'm going to show is a very common need of applications: the ability to send emails without blocking the main application.

For this example I'm going to use the Flask-Mail extension, which I covered in very good detail in other articles. I'm going to assume that you are familiar with this extension, so if you need a refresher see this tutorial or my Flask book.

The example application that I'm going to use to illustrate the topic presents a simple web form with one text field. The user is asked to enter an email address in this field, and upon submission, the server sends a test email to this address. The form includes two submit buttons, one to send the email immediately, and another to send it after a wait of one minute. The top portion of the screenshot at the top of this article shows how this form looks.

Here is the HTML template that supports this example:

<html>
  <head>
    <title>Flask + Celery Examples</title>
  </head>
  <body>
    <h1>Flask + Celery Examples</h1>
    <h2>Example 1: Send Asynchronous Email</h2>
    {% for message in get_flashed_messages() %}
    <p style="color: red;">{{ message }}</p>
    {% endfor %}
    <form method="POST">
      <p>Send test email to: <input type="text" name="email" value="{{ email }}"></p>
      <input type="submit" name="submit" value="Send">
      <input type="submit" name="submit" value="Send in 1 minute">
    </form>
  </body>
</html>

Hopefully you find nothing earth shattering here. Just a regular HTML form, plus the ability to show flashed messages from Flask.

The Flask-Mail extension requires some configuration, specifically the details about the email server to use when sending emails. To make things easy I use my Gmail account as email server:

# Flask-Mail configuration
app.config['MAIL_SERVER'] = 'smtp.googlemail.com'
app.config['MAIL_PORT'] = 587
app.config['MAIL_USE_TLS'] = True
app.config['MAIL_USERNAME'] = os.environ.get('MAIL_USERNAME')
app.config['MAIL_PASSWORD'] = os.environ.get('MAIL_PASSWORD')
app.config['MAIL_DEFAULT_SENDER'] = 'flask@example.com'

Note how to avoid putting my email account's credentials at risk I set them in environment variables, which I import from the application.

There is a single route to support this example:

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'GET':
        return render_template('index.html', email=session.get('email', ''))
    email = request.form['email']
    session['email'] = email

    # send the email
    email_data = {
        'subject': 'Hello from Flask',
        'to': email,
        'body': 'This is a test email sent from a background Celery task.'
    }
    if request.form['submit'] == 'Send':
        # send right away
        send_async_email.delay(email_data)
        flash('Sending email to {0}'.format(email))
    else:
        # send in one minute
        send_async_email.apply_async(args=[email_data], countdown=60)
        flash('An email will be sent to {0} in one minute'.format(email))

    return redirect(url_for('index'))

Once again, this is all pretty standard Flask. Since this is a very simple form, I decided to handle it without the help of an extension, so I use request.method and request.form to do all the management. I save the value that the user enters in the text field in the session, so that I can remember it after the page reloads.

The data associated with the email, which is the subject, recipient(s) and body, are stored in a dictionary. The interesting bit in this route is the sending of the email, which is handled by a Celery task called send_async_email, invoked either via delay() or apply_async() with this dictionary as an argument.

The last piece of this application is the asynchronous task that gets the job done:

@celery.task
def send_async_email(email_data):
    """Background task to send an email with Flask-Mail."""
    msg = Message(email_data['subject'],
                  sender=app.config['MAIL_DEFAULT_SENDER'],
                  recipients=[email_data['to']])
    msg.body = email_data['body']
    with app.app_context():
        mail.send(msg)

This task is decorated with celery.task to make it a background job. The function constructs a Message object from Flask-Mail using the data from the email_data dictionary. One notable thing in this function is that Flask-Mail requires an application context to run, so one needs to be created before the send() method can be invoked.

It is important to note that in this example the return value from the asynchronous call is not preserved, so the application will never know if the call succeeded or not. When you get to run this example, you can look at the output of the Celery worker to troubleshoot any problems with the sending of the email.

Complex Example: Showing Status Updates and Results

The above example is overly simple, the background job is started and then the application forgets about it. Most Celery tutorials for web development end right there, but the fact is that for many applications it is necessary for the application to monitor its background tasks and obtain results from it.

What I'm going to do now is extend the above application with a second example that shows a fictitious long running task. The user can start one or more of these long running jobs clicking a button, and the web page running in your browser uses ajax to poll the server for status updates on all these tasks. For each task the page will show a graphical status bar, a completion percentage, a status message, and when the task completes, a result value will be shown as well. You can see how all this looks in the screenshot at the top of this article.

Background Tasks with Status Updates

Let me start by showing you the background task that I'm using for this second example:

@celery.task(bind=True)
def long_task(self):
    """Background task that runs a long function with progress reports."""
    verb = ['Starting up', 'Booting', 'Repairing', 'Loading', 'Checking']
    adjective = ['master', 'radiant', 'silent', 'harmonic', 'fast']
    noun = ['solar array', 'particle reshaper', 'cosmic ray', 'orbiter', 'bit']
    message = ''
    total = random.randint(10, 50)
    for i in range(total):
        if not message or random.random() < 0.25:
            message = '{0} {1} {2}...'.format(random.choice(verb),
                                              random.choice(adjective),
                                              random.choice(noun))
        self.update_state(state='PROGRESS',
                          meta={'current': i, 'total': total,
                                'status': message})
        time.sleep(1)
    return {'current': 100, 'total': 100, 'status': 'Task completed!',
            'result': 42}

For this task I've added a bind=True argument in the Celery decorator. This instructs Celery to send a self argument to my function, which I can then use to record the status updates.

Since this task doesn't really do anything useful, I decided to use humorous status messages that are assembled from random verbs, adjectives and nouns. You can see the lists of non-sensical items I use to generate these messages above. Nothing wrong with having a little bit of fun, right?

The function loops for a random number of iterations between 10 and 50, so each run of the task will have a different duration. The random status message is generated on the first iteration, and then can be replaced in later iterations with a 25% chance.

The self.update_state() call is how Celery receives these task updates. There are a number of built-in states, such as STARTED, SUCCESS and so on, but Celery allows custom states as well. Here I'm using a custom state that I called PROGRESS. Attached to the state there is additional metadata, in the form of a Python dictionary that includes the current and total number of iterations and the randomly generated status message. A client can use these elements to display a nice progress bar. Each iteration sleeps for one second, to simulate some work being done.

When the loop exits, a Python dictionary is returned as the function's result. This dictionary includes the updated iteration counters, a final status message and a humorous result.

The long_task() function above runs in a Celery worker process. Below you can see the Flask application route that starts this background job:

@app.route('/longtask', methods=['POST'])
def longtask():
    task = long_task.apply_async()
    return jsonify({}), 202, {'Location': url_for('taskstatus',
                                                  task_id=task.id)}

As you can see the client needs to issue a POST request to /longtask to kick off one of these tasks. The server starts the task, and stores the return value. For the response I used status code 202, which is normally used in REST APIs to indicate that a request is in progress. I also added a Location header, with a URL that the client can use to obtain status information. This URL points to another Flask route called taskstatus, and has task.id as a dynamic component.

Accessing Task Status from the Flask Application

The taskstatus route referenced above is in charge of reporting status updates provided by background tasks. Here is the implementation of this route:

@app.route('/status/<task_id>')
def taskstatus(task_id):
    task = long_task.AsyncResult(task_id)
    if task.state == 'PENDING':
        # job did not start yet
        response = {
            'state': task.state,
            'current': 0,
            'total': 1,
            'status': 'Pending...'
        }
    elif task.state != 'FAILURE':
        response = {
            'state': task.state,
            'current': task.info.get('current', 0),
            'total': task.info.get('total', 1),
            'status': task.info.get('status', '')
        }
        if 'result' in task.info:
            response['result'] = task.info['result']
    else:
        # something went wrong in the background job
        response = {
            'state': task.state,
            'current': 1,
            'total': 1,
            'status': str(task.info),  # this is the exception raised
        }
    return jsonify(response)

This route generates a JSON response that includes the task state and all the values that I set in the update_state() call as the meta argument, which the client can use to build a progress bar. Unfortunately this function needs to check for a few edge conditions as well, so it ended up being a bit long. To access task data I recreate the task object, which is an instance of class AsyncResult, using the task id given in the URL.

The first if block is for when the task hasn't started yet (PENDING state). In this case there is no status information, so I make up some data. The elif block that follows is that one that returns the status information from the background task. Here the information that the task provided is accessible as task.info. If the data contains a result key, then that means that this is the final result and the task finished, so I add that result to the response as well. The else block at the end covers the possibility of an error, which Celery will report by setting a task state of "FAILURE", and in that case task.info will contain the exception raised. To handle errors I set the text of the exception as a status message.

Believe it or not, this is all it takes from the server. The rest needs to be implemented by the client, which in this example is a web page with Javascript scripting.

Client-Side Javascript

It isn't really the focus of this article to describe the Javascript portion of this example, but in case you are interested, here is some information.

For the graphical progress bar I'm using nanobar.js, which I included from a CDN. I also included jQuery, which simplifies the ajax calls significantly:

<script src="//cdnjs.cloudflare.com/ajax/libs/nanobar/0.2.1/nanobar.min.js"></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>

The button that starts a background job is connected to the following Javascript handler:

    function start_long_task() {
        // add task status elements 
        div = $('<div class="progress"><div></div><div>0%</div><div>...</div><div>&nbsp;</div></div><hr>');
        $('#progress').append(div);

        // create a progress bar
        var nanobar = new Nanobar({
            bg: '#44f',
            target: div[0].childNodes[0]
        });

        // send ajax POST request to start background job
        $.ajax({
            type: 'POST',
            url: '/longtask',
            success: function(data, status, request) {
                status_url = request.getResponseHeader('Location');
                update_progress(status_url, nanobar, div[0]);
            },
            error: function() {
                alert('Unexpected error');
            }
        });
    }

This function starts by adding a few HTML elements that will be used to display the new background task's progress bar and status. This is done dynamically because the user can add any number of jobs, and each job needs to get its own set of HTML elements.

To help you understand this better, here is the structure of the added elements for a task, with comments to indicate what each div is used for:

<div class="progress">
    <div></div>         <-- Progress bar
    <div>0%</div>       <-- Percentage
    <div>...</div>      <-- Status message
    <div>&nbsp;</div>   <-- Result
</div>
<hr>

The start_long_task() function then instantiates the progress bar according to nanobar's documentation, and finally sends the ajax POST request to /longtask to initiate the Celery background job in the server.

When the POST ajax call returns, the callback function obtains the value of the Location header, which as you saw in the previous section is for the client to invoke to get status updates. It then calls another function, update_progress() with this status URL, the progress bar object and the root div element subtree created for the task. Below you can see this update_progress() function, which sends the status request and then updates the UI elements with the information returned by it:

    function update_progress(status_url, nanobar, status_div) {
        // send GET request to status URL
        $.getJSON(status_url, function(data) {
            // update UI
            percent = parseInt(data['current'] * 100 / data['total']);
            nanobar.go(percent);
            $(status_div.childNodes[1]).text(percent + '%');
            $(status_div.childNodes[2]).text(data['status']);
            if (data['state'] != 'PENDING' && data['state'] != 'PROGRESS') {
                if ('result' in data) {
                    // show result
                    $(status_div.childNodes[3]).text('Result: ' + data['result']);
                }
                else {
                    // something unexpected happened
                    $(status_div.childNodes[3]).text('Result: ' + data['state']);
                }
            }
            else {
                // rerun in 2 seconds
                setTimeout(function() {
                    update_progress(status_url, nanobar, status_div);
                }, 2000);
            }
        });
    }

This function sends the GET request to the status URL, and when a response is received it updates the different HTML elements for the task. If the background task completed and a result is available then it is added to the page. If there is no result then that means that the task ended due to an error, so the task state, which is going to be FAILURE, is shown as result.

When the server is still running the job I need to continue polling the task status and updating the UI. To achieve this I set a timer to call the function again in two seconds. This will continue until the Celery task completes.

A Celery worker runs as many concurrent jobs as there are CPUs by default, so when you play with this example make sure you start a large number of tasks to see how Celery keeps jobs in PENDING state until the worker can take it.

Running the Examples

If you made it all the way here without running the example application, then it is now time for you to try all this Celery goodness. Go ahead and clone the Github repository, create a virtual environment, and populate it:

$ git clone https://github.com/miguelgrinberg/flask-celery-example.git
$ cd flask-celery-example
$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt

Note that the requirements.txt file included with this repository contains Flask, Flask-Mail, Celery and the Redis client, along with all their dependencies.

Now you need to run the three processes required by this application, so the easiest way is to open three terminal windows. On the first terminal run Redis. You can just install Redis according to the download instructions for your operating system, but if you are on a Linux or OS X machine, I have included a small script that downloads, compiles and runs Redis as a private server:

$ ./run-redis.sh

Note that for the above script to work you need to have gcc installed. Also note that the above command is blocking, Redis will start in the foreground.

On the second terminal run a Celery worker. This is done with the celery command, which is installed in your virtual environment. Since this is the process that will be sending out emails, the MAIL_USERNAME and MAIL_PASSWORD environment variables must be set to a valid Gmail account before starting the worker:

$ export MAIL_USERNAME=<your-gmail-username>
$ export MAIL_PASSWORD=<your-gmail-password>
$ source venv/bin/activate
(venv) $ celery worker -A app.celery --loglevel=info

The -A option gives Celery the application module and the Celery instance, and --loglevel=info makes the logging more verbose, which can sometimes be useful in diagnosing problems.

Finally, on the third terminal window run the Flask application, also from the virtual environment:

$ source venv/bin/activate
(venv) $ python app.py

Now you can navigate to http://localhost:5000/ in your web browser and try the examples!

Conclusion

Unfortunately when working with Celery you have to take a few more steps than simply sending a job to a background thread, but the benefits in flexibility and scalability are hard to ignore. In this article I tried to go beyond the "let's start a background job" example and give you a more complete and realistic portrait of what using Celery might entail. I sincerely hope I haven't scared you with too much information!

As always, feel free to write down any questions or comments below.

Miguel

262 comments

  • #176 Miguel Grinberg said 2019-02-27T15:12:15Z

    @Alejandro: so these scripts are files that are stored in the container file system? That is a bad idea, when the container is removed those files are going to go away. You need to store your files in a directory in the docker host that is shared with any containers that need access through a volume. Or else you need to stored them in the database and not as files.

  • #177 nb said 2019-03-20T16:30:18Z

    Hey Miguel,

    Thanks for the great tutorial. I recently started working on celery. The code used to run completely fine in the background, but recently without making any changes to the code I am getting this error,

    worker_1 | [2019-03-20 16:15:38,185: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0.',) worker_1 | Traceback (most recent call last): worker_1 | File "/pyenv/versions/service/lib/python3.6/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost worker_1 | human_status(exitcode)), worker_1 | billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0.

  • #178 Josh said 2019-03-20T17:02:12Z

    I've been looking for an example of a simple solution for an issue that I have. Namely, I have a simple page with a few input fields and a 'Run' button. On the server side I have a Flask python process that takes these inputs and processes for a few minutes, during which I have a few print() statements to print status information and partial results. What I want is to update the browser page with these printouts. I was hoping there was a simple solution to it without the need for sockets and processes and queues as I'm running this on a free account on pythonanywhere.com. I understand python, but I don't know much about web development. Any ideas?

  • #179 Miguel Grinberg said 2019-03-20T18:53:48Z

    @Josh: Unfortunately there are no solutions that I would call simple. If you don't want to use websockets, or processes or queues, then the only solution that comes to mind is to use a streaming response, which is something that Flask supports natively. Basically you keep the response open, and each time you have some new data to send, you add to it. The tricky part is that you will need to convert this process to a format that can be used with a generator, which is what Flask uses for streaming responses. And you will need some JS on the browser side to receive and display these updates.

  • #180 Miguel Grinberg said 2019-03-20T18:55:02Z

    @nb: it would appear as if your worker processes are exiting on their own. If you are not exiting explicitly, it could be one of your dependencies I guess.

  • #181 Rian said 2019-03-28T12:22:07Z

    Tried running the celery worker command and got this error

    File "/home/george/projects/.virtualenvs/nswa/lib/python3.7/site-packages/celery/backends/redis.py", line 22 from . import async, base ^ SyntaxError: invalid syntax

    Can you please help with this error?

  • #182 Miguel Grinberg said 2019-03-28T19:54:47Z

    @Rian: try upgrading your redis package.

  • #183 Rian said 2019-03-30T09:28:07Z

    I applied the upgrade....still shows the same error.

  • #184 Miguel Grinberg said 2019-03-30T09:50:30Z

    @Rian: actually I was mistaken, sorry. What you need to upgrade is Celery and Kombu.

  • #185 s said 2019-04-16T21:52:46Z

    Hi Miguel, thanks for the tutorials!

    I have a question I was hoping you could help with. I am creating a Flask website where users login and complete a survey every two weeks from the date they register. It is the same survey each time. The survey is done in wt-forms. I have already created the survey and the login/log out code but I'm confused as to how to deal with the survey. They need to be sent an email every two weeks to log in and complete the survey again (was thinking about using celery for this?) but I need to be able to see the results of each survey and the aggregate results of the survey combined for every user (e.g. person x answer c for question 2 in week4 survey but answered d for the same question in week2). I would appreciate any advice on how to achieve this in Flask. Thanks bud!

  • #186 Miguel Grinberg said 2019-04-17T07:13:18Z

    @s: for the email reminders Celery seems overkill to me. You just need a cron job that is running on your host. You can run it once a day for example, and find the list of users that are due to receive a new survey. Flask does not really have anything to do with this task.

  • #187 Osman Ghani said 2019-06-12T07:29:28Z

    Hi Miguel, Why we cannot print anything to the terminal of celery worker, app.logger.info(str) prints into the flask server logs I tried from celery.utils.log import get_task_logger logger = get_task_logger(name)

    @celery.task def add(x, y): result = x + y logger.info('Add: {x} + {y} = {result}') return result

    and it does nothing at all.

  • #188 Miguel Grinberg said 2019-06-12T10:02:34Z

    @Osman: did you set the logger level to INFO? I believe Celery only logs errors by default. Try changing the logger.info to a logger.error.

  • #189 azuu said 2019-07-05T04:48:31Z

    Hi Miguel Grinberg , It is really an excellent tutorial,i am new to it and it really helped me a lot,but i have one question i have four to five python script which is run by taking some command line argument ,now i want these script to run as celery task and views its status on the flask and these script will be passed using flask server or i should have some button on the flask side when it is clicked it will run that script as celery task and will update the result as you have shown your tutorial for long task. i have tried so far like this:

    @app.route('/script_path/') # flask server def taking_script_name(script_name): calling_script.delay(script_name) return 'i have sent an async script request'

    @celery.task # it may be wrong ,i am stuck how to call that script def calling_script(script_name): result = script_name return {'result':result}

    help me,thank you in advance

  • #190 Miguel Grinberg said 2019-07-05T17:06:29Z

    @azuu: You will need to modify your scripts to pass regular updates. See how I register progress updates, you will need to add the same in your scripts.

  • #191 Brent said 2019-11-22T19:39:32Z

    Thank you for the article. I got the code to work. When I use bootstrap, I only see the nanobar and the % underneath it. The status and result do not show. Is there a way to make them work with bootstrap?

  • #192 Miguel Grinberg said 2019-11-22T23:55:17Z

    @Brent: probably best to adapt the example to use the Bootstrap progress bar instead of an external component.

  • #193 Kenny said 2019-11-27T22:08:57Z

    Great content! Thanks a lot!

  • #194 Luca said 2019-12-12T07:15:39Z

    Hi Miguel, thank you for your tutorial. I have a question for you. Is it possible to run celery workers that need to access to Flask app code on a separate machine? Thanks again.

    Luca

  • #195 Miguel Grinberg said 2019-12-12T10:32:45Z

    @Luca: Yes, this is fine, the workers can be on different machines because each worker process creates its own copy of the Flask application instance.

  • #196 Mohd Mujtaba said 2019-12-25T07:05:16Z

    There is no id named progress so it must give error isn't it?? $('#progress').append(div);

  • #197 Miguel Grinberg said 2019-12-25T18:47:42Z

  • #198 Dominik said 2020-01-08T14:00:31Z

    Thanks for writing this down! Even though it's five years old it's still super useful to me!! :-)

  • #199 Srinivas Narayanan said 2020-02-05T04:27:54Z

    One quick question on the Celery configuration. As part of Celery initialization you are only providing the broker and no backend (or results_backend). In that case, how does the second example work? Does celery treat the broker as the default result_backend store? Or am I missing something? Please let me know.

  • #200 Miguel Grinberg said 2020-02-05T10:22:22Z

    @srinivas: I don't understand what you are asking. The example provided with this article does set the result backend.

Leave a Comment