Integrating Elasticsearch in the VoIPGRID platform

Written by Hans Adema on 14th July 2017

In my previous article, I showed you how I used the results from Elasticsearch to optimize the search results for the VoIPGRID telephony platform. A lot has happened since then, my final report has been handed in and my internship is done!

Last month I did a talk outlining the basics of my new and improved search functionality. However, I didn’t really have the time to go into details as to how my prototype actually works. In this blog post, I would like to explain how I integrated Elasticsearch in VoIPGRID.

The Libraries

The Elasticsearch implementation for VoIPGRID leans heavily on two existing libraries: Elasticsearch DSL and Django Elasticsearch DSL.

Elasticsearch DSL is an official wrapper for the main Elasticsearch client for Python. Elasticsearch DSL makes it easy to use Elasticsearch in any Python project. You can define the schema for your Elasticsearch documents as Python classes (not unlike how database models in Django are defined) and build your search queries with a fluent object oriented API (again, not unlike Django’s QuerySet). Elasticsearch DSL allows you to use Elasticsearch without having to deal with clunky JSON documents manually.

While the way of defining documents in Elasticsearch DSL is similar to Django, it’s in no way tied to the framework. Django Elasticsearch DSL is a library which builds a bridge between both modules. It lets you do a few things:

  • Assign a model class to Elasticsearch DocType classes, so when the model is saved, the corresponding DocType is updated as well.
  • List the fields from the model you want to index, and Django Elasticsearch DSL will automatically convert them to Elasticsearch data types.
  • Use the “search_index” management command to easily rebuild your Elasticsearch indexes and fill them with data from your Django database.

Keeping search simple with SimpleSearch

Django Elasticsearch DSL handles the issue of keeping the Elasticsearch indexes up to date, but doesn’t do anything meaningful when it comes to executing search queries. Elasticsearch DSL makes composing search queries Pythonic, but it’s a bit too explicit to be conveniently used directly throughout an application.

That’s why I wrapped the construction of search queries in a class called SimpleSearch. With SimpleSearch, you can give it a model instance, and it will then create a Search object from Elasticsearch DSL and point it to the indexes and document types which have been defined for the model.

You can also pass additional filters from view (like active or deleted filters), pass a search text and some fields which should be searched and provide an instance of the currently logged in user to automatically apply authorization filters. These will then be transformed to appropriate method calls to the Search object.

You can then pass the SimpleSearch object as an item list to your view and use it like it’s a Django QuerySet. You can slice it and paginate it, and SimpleSearch will pass the options to the DSL Search object. When you iterate over it, SimpleSearch will execute the Elasticsearch query and build you a response.

How to interact with SimpleSearch (and how it interacts with Elasticsearch)

 

This brings us to one of the less conventional features of this implementation: you do not use the Elasticsearch results directly. SimpleSearch wraps the Elasticsearch response in an instance of ModelResponse. ModelResponse subsequently uses the IDs of the documents to load the corresponding models from the primary Django database and drops the rest.

This approach gives you access to all ORM goodness when using the Elasticsearch results, like easily fetching related objects (which are generally hard to model in Elasticsearch). It also makes it easier to denormalize data when indexing the records, because you’re only going to query them and not show them to users. Finally, it ensures your data is always correct, and not out of date because of a slow index refresh or a synchronization delay between the primary and replica shards.

Search for multiple models simultaneously

SimpleSearch works great if you’re searching for a single model class, which is good enough for basic lists. However, for the relation search I’ve built, I needed to search both the client and partner models. So I had to build an alternative class which could support that.

The answer is MultiModelSearch. The API of MultiModelSearch is almost identical to SimpleSearch, with the only difference being that MultiModelSearch accepts a list of model classes rather than a single model class. Beyond that, the API of both classes is identical.

However, before you feel inclined to pass all models in the application to MultiModelSearch trying to create an all-in-one super searcher, it’s important to note the behavior of search queries and filters in Elasticsearch when querying different types of documents.

How MultiModelSearch works

When searching for multiple models, MultiModelSearch builds one big search query which covers multiple indexes and document types, and the same set of filters and search queries is applied to all of them. Elasticsearch does not support specifying different filters for different indexes without having two separate queries and merging the results manually (definitely not recommended).

Search queries are tolerant towards specifying more fields to search than are actually present in the document. If the document you’re searching for doesn’t match one of the fields you’re trying to search, it will still be considered a valid result if a hit is found in one of the other fields.

Filters, on the other hand, are not quite as flexible. For a document to pass through the filters, it needs to contain the value in the field. If the field is not present in a document, it cannot have the required value, meaning no documents of that type will be matched.

Because of that, it’s not possible to search for, say, clients and dial plans at the same time. The former is owned by a partner, whereas the latter is owned by a client. Because you have to filter dial plans on client IDs and clients on partner IDs, you can’t use the same set of filters for both models.

If you try to do this with MultiModelSearch, your authorization call will be rejected and an exception will be raised. And if you try to filter on conflicting fields manually, the lack of results should be telling.

However, when the various documents are similar enough to search them both simultaneously, MultiModelSearch makes doing so very easy.

So, what should I do with it?

First of all, use Elasticsearch DSL and, if you’re using Django, use Django Elasticsearch DSL as well. They take a lot of work off your hands regarding integrating Elasticsearch with your implementation in a clean and efficient way.

Also, after you create a wrapper like the one I made, using Elasticsearch should be just as easy as something like Haystack. It allows developers without Elasticsearch experience to easily integrate search functionality on their pages without having to deal with the minutiae of Elasticsearch. However, it still gives you access to the full power of Elasticsearch where required. It’s absolutely the best of both worlds.

And to conclude: what’s the result of this for our VoIPGRID partners? In my next blog post, I’ll show you how the search functionality I built can help our partners be more efficient in their work.

Your thoughts

No comments so far

Devhouse Spindle