django full text search

How to Use Django & PostgreSQL for Full Text Search

To make use of the full text search in Django, include ‘django.contrib.postgres’ in your INSTALLED_APPS settings module. The search lookup is a drop-in replacement and opens up a world of new functionality.

Here’s what Django ORM will create for this set of queries:

SELECT school, city

FROM schools

WHERE to_tsvector (COALESCE (school, ”)) @@ (plainto_tsquery (‘alamo’)) = true;

To perform a search, the ORM wraps the target column when the to_tsvector function is called and wraps the search query when calling plainto_tsquery. These functions translate the search query and the target into the corresponding full-text search data types and then apply the @@ matching operator to them. If the search query includes several space-separated words, the function plainto_tsquery will insert the & operator between them, so that the search matches only if all the words are present in the target column.

Now let’s look at the query plan:

Seq Scan on schools (actual = 29.272..152.950 rows = 6 loops = 1)

 Filter: (to_tsvector ((COALESCE (school, ” :: character varying)) :: text) @@ plainto_tsquery (‘alamo’ :: text))

 Rows Removed by Filter: 17718

 Planning time: 4.234 ms

 Execution time: 152.977 ms

We went back to a sequential scan, and our running time increased significantly. We can optimize this in two ways. First, we need to find out how to use the full-text search engine of the corresponding index. Secondly, since the to_tsvector function is applied on every line with each execution of the search query, we can eliminate this by first calculating the vectorized form of the target column.

Both of these tasks can be performed in Postgres quite easily. The section “Creating indexes” in Postgres documents describes what we need to do for indexed full text search operations. We will create an index of type “GIN”.

CREATE INDEX schools_gin_idx ON schools USING GIN (to_tsvector (‘english’, coalesce (school, ”))));

The documentation explains that when creating the index, we must specify the configuration name. To use the index, our requests must also specify this configuration name. Currently, the Django search API doesn’t allow you to include the configuration name. Therefore, even after creating this index, our search queries, as shown above, will continue to use sequential scanning. We can work around this limitation by using the optional () QuerySet modifier.

results = Schools.objects.extra (where = [“to_tsvector (‘english’, coalesce (school, ”)) @@ (plainto_tsquery (% s)) = true”], params = [query]). values ​​(‘ school ‘,’ city ‘)

The following SQL statement will now be created:

SELECT school, city

FROM schools

WHERE to_tsvector (‘english’, COALESCE (school, ”)) @@ (plainto_tsquery (‘alamo’)) = true;

Django provides a SearchVector class that allows us to avoid the need to use additional (). SearchVector allows us to pass the search configuration name, so we can still use the index.

results = Schools.objects.annotate (search = SearchVector (‘school’, config = ‘english’)). filter (search = query) .values ​​(‘school’, ‘city’)

This statement will produce the same query plan as above, using the index, but now we use the Django API much cleaner.


Igor Grigorenko

Add comment