Home  //  Play

Did you mean?

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Did You Mean?

In this course you will learn how Manticore Search can correct wrong typed words

Did you mean?

Step 1 of 4

Introduction

Besides autocomplete feature, for which we covered a simple example in this course https://play.manticoresearch.com/simpleautocomplete/), another common feature people add to search applications is ability to show corrections of wrong typed words.

Manticore Search comes with a feature that allows getting suggestions for a word from the index dictionary. It is done by enabling infixing option. Not only infixing allows wildcard searches, but it also creates ngram hashes from the indexed words. Ngrams (or just parts of words of N characters length) are used to find words that are close to each other (as plain text, not linguistic-wise). Combined with Levenshtein distance between the suggestion candidate word and the original word, we can provide suggestions that are suitable as corrections for the bad word. This functionality is provided by CALL SUGGEST and CALL QSUGGEST functions (read more in the doc - https://docs.manticoresearch.com/latest/html/sphinxql_reference/call_qsuggest_syntax.html).

First we should enable infixing in our index.

index movies
 {
    type            = plain
    path            = /var/lib/manticore/data/movies
    source          = movies
    min_infix_len   = 3
 }

Did you mean?

When a user performs a query that returns no results it's possible that the user may have mistyped something.

Let's connect to Manticore and take an example (mind the mistype in 'american'):

mysql -P9306 -h0

SELECT * FROM movies WHERE match('@movie_title amercan beauty');

No results. If we look at what SHOW META says:

SHOW META;

We can see the word 'amercan' has no hits. This is an indication that the user may have mistyped the word.

We can do a check with CALL SUGGEST:

CALL SUGGEST('amercan','movies');

Which gives us several suggestions. The first suggestion is 'american' which has a distance of 1 and occurs in 128 documents. We can assume the user tried to type 'american beauty' instead of 'amercan beauty'. What we can do next is to show the user this suggestion - that is called the 'did you mean?' feature on most search engines.

However there is a question if this corrected search would return any results. While we know the words exist in our collection and there're matching documents for them it's possible that there's no single document in which they both occur. We could give the user the suggestion and let him try it or we can run the query to see if it brings results.

SELECT id FROM movies WHERE MATCH('american beauty');

Yes, it does - one document found.

In the course about autocomplete, we showed an example where user would enter 'americ*' and our query would suggest movie titles based on that. What happens if the user ignores the autocomplete suggestions and enters a bad word like 'americn'. The query that is supposed to display the search results will return no matches.

SELECT id,movie_title FROM movies WHERE MATCH('@movie_title americn');

If the query doesn't return any results we should check if the user didn't make a typing mistake. In case of single words, it's pretty easy: we issue a CALL SUGGEST:

CALL SUGGEST('americn','movies');

The function will return a list of possible words sorted by a distance related to the input and by a number of documents where these words appear. In most cases the closest word with the highest number of occurrences can be considered the correct choice.

CALL SUGGEST can be also used when doing autocomplete. The may see the autocomplete suggestions, but if we don't do a replacement of his own input with the first suggestion from autocomplete, he can mistype a word and start a new one. Obvious the autocomplete query will not return suggestions in this case.

SELECT movie_title FROM movies WHERE MATCH('@movie_title americn beaut*');

In these cases, we can take the words except the last one and check them. If the input word(s) are different from what SUGGEST give us, we can replace them in the autocomplete query and run it again.

Did you mean?

In previous step we saw how we can do a correction. It is possible by replacing the bad word with the suggestion to get a query that still doesn't return results. In this case, we can try the next suggestions to see if they can build up a search with results.

For example:

SELECT movie_title FROM movies WHERE MATCH('Raign of Fire');

CALL SUGGEST('raign','movies');

The function results in 3 words with distance=1, sorted by occurrences. From the engine's point, 'rain' would be best as it's the most common one. But if we use 'rain' in the original query, we see we don't get any result:

SELECT movie_title FROM movies WHERE MATCH('rain of fire') ;

The actual word we are looking for is the third one on the list - 'reign'.

Obvious, doing these checks can be time consuming. We can reduce the time by striping out the query checks of sorting/grouping clauses, limit the result to 0 and use SHOW META to see if anything was found and use a less complex ranker.

Correcting autocomplete

CALL SUGGEST can be also used when doing autocomplete. The may see the autocomplete suggestions, but if we don't do a replacement of his own input with the first suggestion from autocomplete, he can mistype a word and start a new one. Obvious the autocomplete query will not a return suggestions in this case.

SELECT movie_title FROM movies WHERE MATCH('@movie_title americn beaut*');

We can apply the same strategy used for 'Did you mean?' by checking if completed words have matches in our data. The last word, which is wildcarded, is not suitable for suggestion check since it's incomplete and SUGGEST may return only similar words close to the typed word part and not the word user is trying to type.

CALL SUGGEST('americn', 'movies');

CALL SUGGEST will provies us a correction - 'american'. After checking all complete words, we rebuild the text query and run the search again:

SELECT movie_title FROM movies WHERE MATCH('@movie_title american beaut*');

As the query will provide a result, this will be shown to the user.