2012-11-04T08:00:41Z

The Flask Mega-Tutorial, Part X: Full Text Search

(Great news! There is a new version of this tutorial!)

This is the tenth article in the series in which I document my experience writing web applications in Python using the Flask microframework.

The goal of the tutorial series is to develop a decently featured microblogging application that demonstrating total lack of originality I have decided to call microblog.

NOTE: This article was revised in September 2014 to be in sync with current versions of Python and Flask.

Here is an index of all the articles in the series that have been published to date:

Recap

In the previous article in the series we've enhanced our database queries so that we can get results on pages.

Today, we are going to continue working on our database, but in a different area. All applications that store content must provide a search capability.

For many types of web sites it is possible to just let Google, Bing, etc. index all the content and provide the search results. This works well for sites that have mostly static pages, like a forum. In our little microblog application the basic unit of content is just a short user post, not a whole page. The type of search results that we want are dynamic. For example, if we search for the word "dog" we want to see blog posts from any users that include that word. It is obvious that until someone searches for that word there is no page that the big search engines could have indexed with these results, so clearly we have no choice other than rolling our own search.

Introduction to full text search engines

Unfortunately support for full text search in relational databases is not well standardized. Each database implements full text search in its own way, and SQLAlchemy at this time does not have a full text search abstration.

We are currently using SQLite for our database, so we could just create a full text index using the facilities provided by SQLite, bypassing SQLAlchemy. But that isn't a good idea, because if one day we decide to switch to another database we would need to rewrite our full text search capability for another database.

So instead, we are going to let our database deal with the regular data, and we are going to create a specialized database that will be dedicated to text searches.

There are a few open source full text search engines. The only one that to my knowledge has a Flask extension is Whoosh, an engine also written in Python. The advantage of using a pure Python engine is that it will install and run anywhere a Python interpreter is available. The disadvantage is that search performance will not be up to par with other engines that are written in C or C++. In my opinion the ideal solution would be to have a Flask extension that can connect to several engines and abstract us from dealing with a particular one in the same way Flask-SQLAlchemy gives us the freedom to use several database engines, but nothing of that kind seems to be available for full text searching at this time. Django developers do have a very nice extension that supports several full text search engines called django-haystack. Maybe one day someone will create a similar extension for Flask.

But for now, we'll implement our text searching with Whoosh. The extension that we are going to use is Flask-WhooshAlchemy, which integrates a Whoosh database with Flask-SQLAlchemy models.

Python 3 Compatibility

Unfortunately, we have a problem with Python 3 and these packages. The Flask-WhooshAlchemy extension was never made compatible with Python 3. I have forked this extension and made a few changes to make it work, so if you are on Python 3 you will need to uninstall the official version and install my fork:

$ flask/bin/pip uninstall flask-whooshalchemy
$ flask/bin/pip install git+git://github.com/miguelgrinberg/flask-whooshalchemy.git

Sadly this isn't the only problem. Whoosh also has issues with Python 3, it seems. In my testing I have encontered this bug, and to my knowledge there isn't a solution available, which means that at this time the full text search capability does not work well on Python 3. I will update this section once the issues are resolved.

Configuration

Configuration for Flask-WhooshAlchemy is pretty simple. We just need to tell the extension what is the name of the full text search database (file config.py):

WHOOSH_BASE = os.path.join(basedir, 'search.db')

Model changes

Since Flask-WhooshAlchemy integrates with Flask-SQLAlchemy, we indicate what data is to be indexed for searching in the proper model class (file app/models.py):

from app import app

import sys
if sys.version_info >= (3, 0):
    enable_search = False
else:
    enable_search = True
    import flask_whooshalchemy as whooshalchemy

class Post(db.Model):
    __searchable__ = ['body']

    id = db.Column(db.Integer, primary_key=True)
    body = db.Column(db.String(140))
    timestamp = db.Column(db.DateTime)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))

    def __repr__(self):
        return '<Post %r>' % (self.body)

if enable_search:
    whooshalchemy.whoosh_index(app, Post)

The model has a new __searchable__ field, which is an array with all the database fields that will be in the searchable index. In our case we only want to index the body field of our posts.

We also have to initialize the full text index for this model by calling the whoosh_index function. Note that since we know that the search capability currently does not work on Python 3 we have to skip its initialization. Once the problems in Whoosh are fixed the logic around enable_search can be removed.

Since this isn't a change that affects the format of our relational database we do not need to record a new migration.

Unfortunately any posts that were in the database before the full text engine was added will not be indexed. To make sure the database and the full text engine are synchronized we are going to delete all posts from the database and start over. First we start the Python interpreter. For Windows users:

flask\Scripts\python

And for everyone else:

flask/bin/python

Then in the Python prompt we delete all the posts:

>>> from app.models import Post
>>> from app import db
>>> for post in Post.query.all():
...    db.session.delete(post)
>>> db.session.commit()

Searching

And now we are ready to start searching. First let's add a few new posts to the database. We have two options to do this. We can just start the application and enter posts via the web browser, as regular users would do, or we can also do it in the Python prompt.

From the Python prompt we can do it as follows:

>>> from app.models import User, Post
>>> from app import db
>>> import datetime
>>> u = User.query.get(1)
>>> p = Post(body='my first post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my second post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my third and last post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> db.session.commit()

The Flask-WhooshAlchemy extension is nice, because it hooks up into Flask-SQLAlchemy commits automatically. We do not need to maintain the full text index, it is all done for us transparently.

Now that we have a few posts in our full text index we can issue searches:

>>> Post.query.whoosh_search('post').all()
[<Post u'my second post'>, <Post u'my first post'>, <Post u'my third and last post'>]
>>> Post.query.whoosh_search('second').all()
[<Post u'my second post'>]
>>> Post.query.whoosh_search('second OR last').all()
[<Post u'my second post'>, <Post u'my third and last post'>]

As you can see in the examples above, the queries do not need to be limited to single words. In fact, Whoosh supports a pretty powerful search query language.

Integrating full text searches into the application

To make the searching capability available to our application's users we have to add just a few small changes.

Configuration

As far as configuration, we'll just indicate how many search results should be returned as a maximum (file config.py):

MAX_SEARCH_RESULTS = 50

Search form

We are going to add a search form to the navigation bar at the top of the page. Putting the search box at the top is nice, because then the search will be accessible from all pages.

First we add a search form class (file app/forms.py):

class SearchForm(Form):
    search = StringField('search', validators=[DataRequired()])

Then we need to create a search form object and make it available to all templates, since we will be putting the search form in the navigation bar that is common to all pages. The easiest way to achieve this is to create the form in the before_request handler, and then stick it in Flask's global g (file app/views.py):

from forms import SearchForm

@app.before_request
def before_request():
    g.user = current_user
    if g.user.is_authenticated:
        g.user.last_seen = datetime.utcnow()
        db.session.add(g.user)
        db.session.commit()
        g.search_form = SearchForm()

Then we add the form to our template (file app/templates/base.html):

<div>Microblog:
    <a href="{{ url_for('index') }}">Home</a>
    {% if g.user.is_authenticated %}
    | <a href="{{ url_for('user', nickname=g.user.nickname) }}">Your Profile</a>
    | <form style="display: inline;" action="{{ url_for('search') }}" method="post" name="search">{{ g.search_form.hidden_tag() }}{{ g.search_form.search(size=20) }}<input type="submit" value="Search"></form>
    | <a href="{{ url_for('logout') }}">Logout</a>
    {% endif %}
</div>

Note that we only display the form when we have a logged in user. Likewise, the before_request handler will only create a form when a user is logged in, since our application does not show any content to guests that are not authenticated.

Search view function

The action field of our form was set above to send all search requests the the search view function. This is where we will be issuing our full text queries (file app/views.py):

@app.route('/search', methods=['POST'])
@login_required
def search():
    if not g.search_form.validate_on_submit():
        return redirect(url_for('index'))
    return redirect(url_for('search_results', query=g.search_form.search.data))

This function doesn't really do much, it just collects the search query from the form and then redirects to another page passing this query as an argument. The reason the search work isn't done directly here is that if a user then hits the refresh button the browser will put up a warning indicating that form data will be resubmitted. This is avoided when the response to a POST request is a redirect, because after the redirect the browser's refresh button will reload the redirected page.

Search results page

Once a query string has been received the form POST handler sends it via page redirection to the search_results handler (file app/views.py):

from config import MAX_SEARCH_RESULTS

@app.route('/search_results/<query>')
@login_required
def search_results(query):
    results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all()
    return render_template('search_results.html',
                           query=query,
                           results=results)

The search results view function sends the query into Whoosh, passing a maximum number of search results, since we don't want to be presenting a potentially large number of hits, we are happy showing just the first fifty.

The final piece is the search results template (file app/templates/search_results.html):

<!-- extend base layout -->
{% extends "base.html" %}

{% block content %}
  <h1>Search results for "{{ query }}":</h1>
  {% for post in results %}
      {% include 'post.html' %}
  {% endfor %}
{% endblock %}

And here, once again, we can reuse our post.html sub-template, so we don't need to worry about rendering avatars or other formatting elements, since all of that is done in a generic way in the sub-template.

Final words

We now have completed yet another important, though often overlooked piece that any decent web application must have.

The source code for the updated microblog application is available below:

Download microblog-0.10.zip.

As always, the above download does not include a database or a flask virtual environment. See previous articles in the series to learn how to create these.

I hope you enjoyed this tutorial. If you have any questions feel free to write in the comments below. Thank you for reading, and I will be seeing you again in the next installment!

Miguel

151 comments

  • #76 Chris said 2015-01-21T11:13:48Z

    Hi, I tried out your tutorial and I like it really. Thank you very much for this.

    I'm using python 3.4 and tried out the search feature and had no problems. May be it is running with this version. I used your fork of flask-whooshalchemy.

  • #77 Miguel Grinberg said 2015-01-21T19:03:24Z

    @Chris: good to know. I'll ensure that everything is working now and enable the feature in the example code. Thanks!

  • #78 Nadav said 2015-03-07T22:12:02Z

    every time I try to put in: Post.query.whoosh_search('post').all()

    I get: File "", line 1, in AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'

    I am stuck.

  • #79 Miguel Grinberg said 2015-03-07T23:49:54Z

    @Nadav: have you installed flask-whooshalchemy? Make sure you have that, then compare your code against mine on Github if this still fails.

  • #80 Bgorkhe said 2015-03-08T04:10:58Z

    Hi Miguel Thanks for the tutorial its great. I am getting this error...somewhere I am doing some mistake looks like: werkzeug.routing.BuildError BuildError: ('search_results', {}, None)

    Eventhough my query comes out to be some string. May be my Post is not updating in database. Thanks in advance.

  • #81 Miguel Grinberg said 2015-03-09T23:10:23Z

    @Bgorkhe: the search_results page needs a query argument, that needs to be given in the url_for call. See above for examples.

  • #82 Andy Strain said 2015-05-15T18:41:18Z

    I am using Python 3.4.0 and, being afraid that the WhooshAlchemy extension for Flask would not work, uninstalled it using pip and the search code to use just Whoosh. Here is what I did:

    I followed this tutorial as is except the code I added to app/views.py was:

    from whoosh.qparser import QueryParser from app import search_ix from config import MAX_SEARCH_RESULTS

    ...

    @app.route('/', methods=['GET', 'POST']) @app.route('/index', methods=['GET', 'POST']) @app.route('/index/', methods=['GET', 'POST']) @login_required def index(page=1): form = PostForm() if form.validate_on_submit(): post = Post(body=form.post.data, timestamp=datetime.utcnow(), author=g.user) db.session.add(post) db.session.commit() flash('Your post is now live!') writer = search_ix.writer() writer.add_document(id=str(post.id), body=post.body) writer.commit() flash('Your post is now indexed!') return redirect(url_for('index')) posts = g.user.followed_posts().paginate(page, POSTS_PER_PAGE, False) return render_template('index.html', title='Home', form=form, posts=posts)

    ...

    @app.route('/search_results/') @login_required def search_results(query): qp = QueryParser('body', schema=search_ix.schema) q = qp.parse(query) with search_ix.searcher() as s: rs = s.search(q, limit=MAX_SEARCH_RESULTS) results = [] for r in rs: results.append(Post.query.get(int(r['id']))) return render_template('search_results.html', query=query, results=results)

    Then in app/init.py, I added the following code to setup the search_ix variable:

    from whoosh.filedb.filestore import FileStorage from whoosh.fields import Schema, TEXT, ID from config import WHOOSH_BASE

    ...

    search_is_new = False if not os.path.exists(WHOOSH_BASE): os.mkdir(WHOOSH_BASE) search_is_new = True search_storage = FileStorage(WHOOSH_BASE) search_ix = None if search_is_new: schema = Schema(id=ID(stored=True), body=TEXT()) search_ix = search_storage.create_index(schema) else: search_ix = search_storage.open_index()

    As you can see, lots more coding, but it does work... lol!

  • #83 Miguel Grinberg said 2015-05-16T00:48:57Z

    @Andy: so now we need someone that can package your solution as an improved extension that works for Python 2 & 3. I can look into it when I have some free time, but maybe you are interested in doing this yourself, since you are familiar with the solution?

  • #84 Artelse said 2015-06-23T11:21:24Z

    Great tutorial and feedback!

    I stalled on this page with the same error as #78:

    AttributeError: 'BaseQuery' object has no attribute 'whoosh_search'

    I checked code against github and can't see why it fails. (Am a bit new on python.) Also I don't see a search.db being created by whoosh, so I suspect it is not imported correctly or something?

  • #85 artelse said 2015-06-23T11:47:46Z

    A follow up to your suggestion on the previous comment page I get:

    import flask.ext.whooshalchemy as whooshalchemy from app import app from app.models import Post whooshalchemy.whoosh_index(app, Post) FileIndex(FileStorage('/volume1/web/aslib/search.db/Post'), 'MAIN') Post.query.whoosh_search('post').all() Traceback (most recent call last): File "", line 1, in File "/volume1/web/aslib/flask/lib/python3.4/site-packages/flask_whooshalchemy.py", line 103, in whoosh_search if not isinstance(query, unicode): NameError: name 'unicode' is not defined

    Any ideas?

  • #86 Miguel Grinberg said 2015-07-01T18:54:08Z

    @artelse: the last time I looked at this Whoosh and the Whoosh extension for Flask were not Python 3 ready.

  • #87 Norbert stüken said 2015-07-13T13:02:23Z

    @Andy Strain: Thanks for your Python3 solution. It works perfect!

  • #88 carlos said 2015-09-28T02:35:06Z

    Hi Miguel, thanks for all. I have learned a lot from you. what do you think about sqlalchemy-searchable?

  • #89 Miguel Grinberg said 2015-09-29T23:26:25Z

    @carlos: I think it is a pitty that sqlalchemy-searchable only supports postgresql.

  • #90 amath said 2015-10-02T18:36:18Z

    Thank you for the excellent tutorials. It may help others to mention that whoosh_search ignores a set of "common" words (called stop words), and all single letter words. I spent quite a bit of time trying to figure out why the blog did not return any results for words like "I", "you", and "not".

  • #91 Neil Lakin said 2015-11-13T22:55:28Z

    Hey Miguel,

    Again, great tutorial. I'm having a strange issue getting whoosh search to work; I was hoping you could shed some light on it.

    When I run through the tests in the python interpreter, I keep getting an empty list back as the result of Post.query.whoosh_search('post').all() (it doesn't matter what I search for).

    I get a warning on the first search of an interpreter session: SAWarning: Textual SQL expression 'null' should be explicitly declared as text('null') (this warning may be suppressed after 10 occurrences) {"expr": util.ellipses_string(element)}) []

    Afterwards, it's just the empty list. No errors, and my dependencies seem to be in order; just no results. I've run a dif against your models.py, and I seem to have everything. search.db is in the correct location. The only difference between my Post model and yours is that my 'body' field is longer--db.String(250). Any chance this could cause the issue?

    I have tried deleting all posts in my db and starting from scratch a few times; the issue is consistent whether I make new posts from the site as a logged in user or from the python interpreter.

  • #92 Miguel Grinberg said 2015-11-14T06:48:19Z

    @Neil: Did you try using the versions of packages from my requirements.txt file? Maybe there is an incompatibility with a newer version of a package.

  • #93 Neil said 2015-11-16T16:24:07Z

    Hey Miguel,

    Where can I find your requirements.txt file? It doesn't seem to be in the zip archive, and I didn't find it by searching for 'requirements' on the posts thus far.

    Best, Neil

  • #94 Neil said 2015-11-17T21:09:57Z

    For what it's worth, I've narrowed down the issue a bit; it appears to be the bug (feature?) described here:

    https://github.com/gyllstromk/Flask-WhooshAlchemy/issues/20

    If I run the following query:

    q = Post.query.whoosh_search("hello")

    And print it:

    print q

    I get: SELECT post.id AS post_id, post.body AS post_body, post.timestamp AS post_timestamp, post.user_id AS post_user_id FROM post WHERE null

    Obviously, the "WHERE null" is the issue. I found your requirements.txt on your github; we have the same versions of flask/whoosh_alchemy/SQLAlchemy, and I'd rather not muck around downgrading dependencies. I'll update here if I figure out a solution in case anyone else has the same issue.

  • #95 Miguel Grinberg said 2015-11-19T05:07:52Z

    @neil: you can find it in the master branch of the github repository: https://github.com/miguelgrinberg/microblog.

  • #96 AP said 2015-12-31T15:52:01Z

    Miguel,

    First of all thank you so much for this tutorial. Having done many tutorials across many different languages this is by far one of the best!

    I was able to get all the code up and run the app with no errors, but my whoosh searches don't return any results. In particular, when I run Post.query.whoosh_search('post').all() or any search for that matter I get [] every time. Any ideas?

  • #97 Miguel Grinberg said 2016-01-03T22:53:49Z

    @AP: did you try adding new posts, after the whoosh support was added? Any posts that you had from before whoosh are not going to be indexed, so they will not be returned.

  • #98 Ke said 2016-01-10T20:14:20Z

    @miguel, I implemented the nav bar search from in my project. I also have a 'edit_profile' page that shows a form with the associated 'submit' button. The navbar search form 'action' is sent to it's search view function and the 'edit_profile' form has its own view function. However, whenever I use the 'edit_profile' form, the submit is hijacked by the navbar search form view function. And anything that's supposed to be handled by the 'edit_profile' view function is bypassed. Can you please help? Thanks.

  • #99 Aby said 2016-01-13T11:23:22Z

    OSError: [Errno 13] Permission denied: 'whoosh_index'

    when i try in my local system it works(mac)but not in server (ubuntu)

  • #100 Miguel Grinberg said 2016-01-14T15:35:19Z

    @Ka: this sounds like a good question for stack overflow. I can't really answer because there isn't enough information on your question. You should show how your forms are created and rendered to the page.

Leave a Comment