Terminal Terminal | Web Web
Home  //  Play

Did you mean?

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Did You Mean?

In this course you will learn how Manticore Search can correct wrong typed words

Did you mean?

Step 1 of 3

Introduction

Besides autocomplete feature, for which we covered a simple example in this course https://play.manticoresearch.com/simpleautocomplete/), another common feature people add to search applications is ability to show corrections of wrong typed words.

Manticore Search comes with a feature that allows getting suggestions for a word from the index dictionary. It is done by enabling infixing option. Not only infixing allows wildcard searches, but it also creates ngram hashes from the indexed words. Ngrams (or just parts of words of N characters length) are used to find words that are close to each other (as plain text, not linguistic-wise). Combined with Levenshtein distance between the suggestion candidate word and the original word, we can provide suggestions that are suitable as corrections for the bad word. This functionality is provided by CALL SUGGEST and CALL QSUGGEST functions (read more in the doc - https://docs.manticoresearch.com/latest/html/sphinxql_reference/call_qsuggest_syntax.html).

First we should enable infixing in our index.

index movies
 {
    type            = plain
    path            = /var/lib/manticore/data/movies
    source          = movies
    min_infix_len   = 3
 }

CALL SUGGEST usage

When a user performs a query that returns no results it's possible that the user may have mistyped something.

Let's connect to Manticore and take an example (mind the mistype in 'revenge'):

mysql -P9306 -h0

And take a quick example of a word suggestion:

CALL SUGGEST('rvenge','movies');

The output contains 3 columns: the suggestion, a calculated Levenshtein distance and doc hits of the sugggestion in the index.

The first suggestion has a distance of 1 compared to our input and it's the actual word expected to be suggested. This is usually the best scenario when we get on the minimal distance a single suggestion, as it's most likely to be the one we look for. It is possible even for distance 1 to have more than one suggestion:

CALL SUGGEST('aprentice','movies');

When they share same distance, suggestions are sorted again by their doc hits. In this example 'apprentice' is most likely what the user wanted as it has more hits than 'prentice'.

Of course, when the input word is actually found in our index, it will appear as the first suggestion with distance=0

CALL SUGGEST('revenge','movies');

If we want to increase the suggestions number, we can add the limit parameter:

CALL SUGGEST('aprentice','movies', 10 as limit);

If we want to restrict the suggestions, we can lower the maximum Levenshtein distance (default is 4) and maximum word length (default is 3):

CALL SUGGEST('aprentice','movies', 10 as limit,3 as max_edits,2 as delta_len);

For the next step we need to exit the mysql client

exit;

A working example

A simple working example of 'Did you mean' can be tested in the Web Panel.

The PHP script provide a simple search page results.

In case the input string doesn't find a result the script test each word with 'CALL SUGGEST' and tries to build a new query string.

If the new query string have matches, it's result set is provided.

The script can be viewed with cat /html/index.php