Terminal Terminal | Web Web
Home  //  Play

Did you mean?

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Did You Mean?

In this course you will learn how Manticore Search can correct wrong typed words

Did you mean?

Step 1 of 3

Introduction

Besides autocomplete feature, for which we covered a simple example in this course https://play.manticoresearch.com/simpleautocomplete/), another common feature people add to search applications is ability to show corrections of wrong typed words.

Manticore Search comes with a feature that allows getting suggestions for a word from the index dictionary. It is done by enabling infixing option. Not only infixing allows wildcard searches, but it also creates ngram hashes from the indexed words. Ngrams (or just parts of words of N characters length) are used to find words that are close to each other (as plain text, not linguistic-wise). Combined with Levenshtein distance between the suggestion candidate word and the original word, we can provide suggestions that are suitable as corrections for the bad word. This functionality is provided by CALL SUGGEST and CALL QSUGGEST functions (read more in the doc - https://docs.manticoresearch.com/latest/html/sphinxql_reference/call_qsuggest_syntax.html).

First we should enable infixing in our index.

index movies
 {
    type            = plain
    path            = /var/lib/manticore/data/movies
    source          = movies
    min_infix_len   = 3
 }

CALL SUGGEST usage

When a user performs a query that returns no results it's possible that the user may have mistyped something.

Let's connect to Manticore and take an example (mind the mistype in 'revenge'):

mysql -P9306 -h0

And take a quick example of a word suggestion:

CALL SUGGEST('rvenge','movies');

The output contains 3 columns: the suggestion, a calculated Levenshtein distance and doc hits of the sugggestion in the index.

The first suggestion has a distance of 1 compared to our input and it's the actual word expected to be suggested. This is usually the best scenario when we get on the minimal distance a single suggestion, as it's most likely to be the one we look for. It is possible even for distance 1 to have more than one suggestion:

CALL SUGGEST('aprentice','movies');

When they share same distance, suggestions are sorted again by their doc hits. In this example 'apprentice' is most likely what the user wanted as it has more hits than 'prentice'.

Of course, when the input word is actually found in our index, it will appear as the first suggestion with distance=0

CALL SUGGEST('revenge','movies');

If we want to increase the suggestions number, we can add the limit parameter:

CALL SUGGEST('aprentice','movies', 10 as limit);

If we want to restrict the suggestions, we can lower the maximum Levenshtein distance (default is 4) and maximum word length (default is 3):

CALL SUGGEST('aprentice','movies', 10 as limit,3 as max_edits,2 as delta_len);

For the next step we need to exit the mysql client

exit;

A working example

Let's see how a real example with code would look like. The following php script accepts at input a query string and will display movie titles if the there're any search results. If the query doesn't match anything, we look in the SHOW META information and check if we have words with zero hits - which could mean the keyword was misspelled. If some keywords get corrected, we try to run the query again using them to see if the query would return results. If it does we display them.

cat search_suggest.php

First run a query that would return results:

php search_suggest.php -q 'star trek'

And now let's break one of the words:

php search_suggest.php -q 'star trec'

And even more, let's break two words:

php search_suggest.php -q 'the finl frontir'