Terminal Terminal | Web Web
Home  //  Play

infix, prefix, wildcard search

Difficulty: Beginner
Estimated Time: 10 minutes

Infix and prefix search

In this course, we learn about infix and prefix functionality for full-text search.

infix, prefix, wildcard search

Step 1 of 4

Search with prefix functionality

Let's log into Manticore using mysql client:

mysql -h0 -P9306

And create two tables. The first one is a regular table:

create table demo (info text, value integer);

In the second table, we will add the parameter min_prefix_len with the minimum prefix length to enable prefix search. By default, this setting is disabled.

create table demo_prefix (info text, value integer) min_prefix_len = '3';

Now let's fill the tables:

insert into demo(info, value) values ('Dummy string without any prefix or infix settings', 1);

insert into demo_prefix (info, value) values ('Dummy string with prefix settings applied', 2);

Now let's perform a search using the full-text search function match(), looking not for the whole word, but only for a part of it. In this case, we will take the word "string", more specifically, the first three characters. Here and further on, we will use the function highlight() to highlight the results. It is worth noting that the function highlight() only outputs the fields where the word was found and highlights it using tags <b></b>. Other fields will need to be requested separately, for example combining: select *, highlight() from.

select highlight() from demo where match ('str*');

select highlight() from demo_prefix where match ('str*');

select highlight() as found, * from demo_prefix where match ('str*');

As you can see, the search in the table where the prefix setting was not enabled returned nothing, while the search in the second table returned a result. This type of search is called "prefix search", allowing you to search for words based on the first few characters.

Search with infix functionality

If you try to search for the middle or end of a word, rather than the beginning, the prefix search will not return any results. Below is an example of searching for the word "string" by its ending in a table with prefix search enabled:

select highlight() from demo_prefix where match('*ing');

However, you can enable infix search. Let's see how it works. To do this, create another table and enable infix search:

create table demo_infix (info text, value integer) min_infix_len = '3';

The value '3' indicates the minimum number of characters allowed for infix search. By default, infix mode is disabled.

Let's populate it similarly to the other tables:

insert into demo_infix (info, value) values ('Dummy string with infix settings applied', 3);

Now, let's perform a search in "prefix" mode in this table:

select highlight() from demo_infix where match ('str*');

The result for a prefix query in an infix table is similar to the table with prefix mode enabled. Next, let's search for the end of the word "string" - *ing:

select highlight() from demo_infix where match('*ing');

Unlike in prefix mode, the result is returned in infix mode.

Wildcard search

For searching, you can use wildcard search, for example, ? to replace one character, or % to replace one character or skip a character. For example, we know that people often make mistakes in the word necessary. Using the % operator, we can take this into account. To demonstrate, let's add all variations to the table:

insert into demo_infix (info) values ('necesary'),('neccesary'),('necessary'),('neccessary');

Let's execute the query:

select highlight() from demo_infix where match('nec%es%ary');

As a result, as you can see, all variations were found.

Here, we are using quite simple templates, but Manticore supports much more advanced regular expressions, which we recommend reading about in the blog article or studying the documentation.

Automatic expansion of search words

Additionally, you can enable the automatic expansion of search words by applying infix search, meaning that if you search for run, the system will search not only for run but also for *run*.

create table demo_expand (info_1 text, value integer) min_infix_len = '3' expand_keywords = '1';

Let's add some documents:

insert into demo_expand (info_1, value) values ('Test string, first record', 1), ('Test string, settings', 2), ('Test string, Setup', 3), ('running', 4), ('runs', 5), ('run', 6);

Let's search for run:

select highlight() from demo_expand where match ('run');

It can be seen that documents were found not only where run is a standalone word but also where it is a part of other words. By using SHOW META; you can see that the search occurred not only for run but also for *run*.

Using hybrid mode

In some cases, it can be useful to limit the prefix mode within infix search. Let's create a table using both parameters infix and prefix:

create table demo_hybrid (value text) min_infix_len = '5' min_prefix_len = '3';

Let's fill it with data:

insert into demo_hybrid (value) values ('First dummy string'), ('Second dummy string'), ('third dummy string');

Now let's check how the query works for infix in the case where we search with a word segment shorter than the set min_infix_len of 5 characters:

select highlight() from demo_hybrid where match ('*ing');

As expected, the result is empty. Let's slightly modify our query, increasing the length of the queried word to 5 characters, which is the minimum for the current setting:

select highlight() from demo_hybrid where match ('*tring');

Now the result is returned. Additionally, if we use a prefix for the query, for example dum*, this length is sufficient for execution since the prefix length is set to 3:

select highlight() from demo_hybrid where match ('dum*');

Setting min_infix_len and min_prefix_len allows controlling the performance and accuracy of the search. Lower values for these parameters may degrade performance due to an increase in the number of potential matches requiring processing.