Home  //  Play

Simple Autocomplete

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Manticore Simple Autocomplete

In this tutorial we learn how to make queries to Manticore Search for an autocomplete functionality

The dataset used in this course is https://data.world/data-society/imdb-5000-movie-dataset (the same as in the CSV import course - https://play.manticoresearch.com/csv/ ).

Simple Autocomplete

Step 1 of 4

Introduction

Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. On websites it's used in search boxes, where a user starts to type a word and a dropdown with suggestions pops up so the user can select the ending from the list. The source for the suggestions can be very various. In general the word or the sentence displayed should be available in the existing data collection, so the user doesn't select something that will return empty results. The complexity and quality of autocomplete can differ, depending how much time you allow to make it better.

In some cases autocomplete is based on previous (successful) searches. In those cases the system has to store somewhere previous user searches that are then used for the autocomplete functionality. While this is simple it requires you to arrange saving and storing previous search queries and it bears the disadvantage of not being able to show suggestions for things that haven't been searched yet.

A very simple autocomplete can be made by finding suggestions from headlines of items in the dataset. That can be a title of an articles/news, a name of a product or in case of this course a name of a movie. To make this work we need to have the field as a string attribute - to not do a lookup in the original data and to perform a query doing two things:

  • Since the user is supposed to provide an incomplete word, we need to perform a wildcard search. Wildcard searches are possible by activating prefixing or infixing in the index. As it may affect latency (https://docs.manticoresearch.com/latest/html/conf_options_reference/index_configuration_options.html#min-infix-len) you need to decide whether you want that to be enabled in the index that is used for searches or you enable it only in a special index dedicated to the autocomplete functionality. Another reason to do so is to make the latter as compact as possible to provide minimal latency as it's especially important for autocomplete UX-wise. Usually we would add a wildcard asterisk to the right, as we assume the user starts a word, however for broader results, we add asterisks to both sides to get words that could have a prefix too. In this course for the movies dataset let's choose infixing, as it also enables the SUGGEST feature for word correction. Our index declaration will be:

    index movies {
      type            = plain
      path            = /var/lib/manticore/data/movies
      source              = movies
      min_infix_len   = 3
    }
    
  • As we are going to provide autocomplete from the movie title, our queries will be limited to the 'movie_title' field.

Autocomplete on movie title

On your application's frontend you can start asking for suggestions from the first character typed by user in a search box. However that can put more pressure on the system as it's going to do more requests to the server and also 1-2 char wildcard searches can be slower. Let's assume the user types 'sha'.

mysql -P9306 -h0

You query will look like this:

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title sha*');

We mostly care only about the movie title, so we're not returning all the columns. As we can see a lot of results are returned. We can try to tweak the query by for example adding a secondary sorting by facebook likes, but it will be still too early to make a good guess on what the user is looking for.

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title sha*') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

Let's assume the user types another letter:

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title shaf*') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

Now we have a single result.

Let's take another example where user types 'shad*' instead.

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title shad*') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title shado*') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title shadow') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

Assuming user was looking for 'shadow', he'll start typing a new word, e.g. 'shadow c'. The asterisk is not added to the new incomplete word 'c*':

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title shadow c*') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

In this case, we get a single result, but in other cases the user will still type letters just like for the first word and our query will return suggestions based on complete and incomplete words.

Continue

In the previous example the only restriction for matched terms was to be a part of the specified field. We can have a more restrictive autocomplete, if we want.

For example here:

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title americ* ') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

we get matches starting with 'americ', like 'American Hustle', but also 'Captain America: Civil War'. We can add the start field operator to show only records starting with the input term:

SELECT id, movie_title FROM movies WHERE MATCH('@movie_title ^americ* ') ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;

Another thing we should take into consideration is duplicates. This applies more when we want to autocomplete on a field that doesn't have unique values.

As an example let's try to do an autocomplete by an actor name:

SELECT actor_1_name FROM movies WHERE MATCH('@actor_1_name john* ');

This can be solved by simply grouping on that field - assuming we have it as a string attribute:

SELECT actor_1_name FROM movies WHERE MATCH('@actor_1_name john* ') GROUP BY actor_1_name;

Highlighting

The autocomplete query can return results with highlighting included. While it can be performed on the applications's side highlighting done by Manticore Search engine is more powerful because it will follow the search rules (the same tokenization settings and so one). Taking the previous example, all we need to do is to use the 'SNIPPET' function:

SELECT SNIPPET(actor_1_name,' john*') FROM movies WHERE MATCH('@actor_1_name john* ') GROUP BY actor_1_name ORDER BY WEIGHT() DESC, cast_total_facebook_likes DESC;