Terminal Terminal | Web Web
Home  //  Play

Introduction into Manticore Search full-text operators

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Introduction into full-text operators

In this tutorial, we'll explore the full-text operators available in Manticore Search.

Introduction into Manticore Search full-text operators

Step 1 of 5

Basics

Let's connect to Manticore using the MySQL client:

mysql -P9306 -h0

The most simple full-text query in Manticore is an enumeration of words. The SQL statement used for definining a full-text match is 'MATCH()', which has a single parameter - the query string.

By comparison, in MySQL, for example, the syntax is 'MATCH(list_of_columns) AGAINST(query_string)':

SELECT * FROM ftsindex WHERE MATCH('title,content') AGAINST('find me fast');

The implicit operator is 'AND' and, by default, the words will be searched in all full-text fields available:

SELECT * FROM testrt WHERE MATCH('find me fast');

For boolean searches the OR ( '|') can be used:

select * from testrt where match('find | me fast');

The OR operator has higher precedence than AND, so the query 'find me fast|slow' is translated as 'find me (fast|slow)':

select * from testrt where match('find me fast|slow');

For negations, operator NOT can be specified with '-' or '!' :

select * from testrt where match('find me !fast');

It must be noted that full negation queries are not supported and it's not possible to run just '!fast'.

Another base operator is MAYBE. The term defined by MAYBE can be present or not in the documents. If it's present, it will influence the ranking, and documents having it will be ranked higher

select * from testrt where match('find me MAYBE slow');

Basic operators - fields

If we want to limit to search to a field, the operator '@' can be used:

select * from testrt where match('@title find me fast');

We can also specify multiple fields to limit the search:

select * from testrt where match('@(title,content) find me fast');

The field operator can also be used to perform a restriction for the search to be made only in the first x words. For example:

select * from testrt where match('@title lazy dog');

However:

select * from testrt where match('@title[5] lazy dog');

In some situations search could be performed over multiple tables that may not have the same full-text fields. Obviously, specifying a field that doesn't exist will result in a query error. To overcome this, the special operator '@@relaxed' can be used:

select * from testrt where match('@(title,keywords) lazy dog');

select * from testrt where match('@@relaxed @(title,keywords) lazy dog');

Fuzzy search

Fuzzy matching allows us to match only some of the words from the query string

select * from testrt where match('"fox bird lazy dog"/3');

In this case we specify that it's fine to match only 3 of the words. A fuzzy of '/1' is equivalent of an OR boolean search, which fuzzy of '/N', where N is the number of input words, is equivalent of an AND search.

Instead of specifying number of words, we can use a fraction between 0.0 and 1.0. This can be helpful when dealing with queries of varying lengths. For example:

select * from testrt where match('"fox bird lazy dog"/0.75');

This query would match any document that has at least three out of the four words 'fox', 'bird', 'lazy', and 'dog'.

Advanced operators

One of the most used operators is the phrase operator. The phrase operator will match only if the given words are found in the verbatim specified order. This will also restrict the words to be found in the same field:

SELECT * FROM testrt WHERE MATCH('"find fast me"');

SELECT * FROM testrt WHERE MATCH('"find me fast"');

A more relaxed version of phrase operator is the strict order operator. The order operator requires only the words to be found in the same order as specified, but other words are accepted between:

SELECT * FROM testrt WHERE MATCH('find << me << fast');

Another pair of operators which work with positions are the start/end field operators. These will restrict a word to be present at start or end of a field.

SELECT * FROM testrt WHERE MATCH('^find me fast$');

Proximity operator is an AND operator that adds a maximum distance in which the words should be found. Let's take this example:

select * from testrt where match('brown fox jumps');

Our query returns 2 results, one in which all words are close to each other and a second one where one of the word is more distant. If we want to match only if the words are within a certain distance, we can restrict that with the proximity operator:

select * from testrt where match('"brown fox jumps"~5');

A more generalized version of proximity is the NEAR operator. In the case of proximity, a single distance is specified over a bag of words. NEAR works with 2 operands, which can be either single words or expressions.

In the following example, 'brown' and 'fox' must be within a distance of 2 and 'fox' and 'jumps' within a distance of 6:

select * from testrt where match('brown NEAR/2 fox NEAR/6 jumps');

The query lefts out one document that doesn't match the first NEAR condition.

select * from testrt where match('brown NEAR/3 fox NEAR/6 jumps');

A variation of NEAR is NOTNEAR, which matches only of the operands have a minimum distance between them.

select * from testrt where match('"brown fox" NOTNEAR/5 jumps');

Advanced operators

Manticore can detect sentences over plain texts and paragraphs in HTML content. For sentences option index_sp needs to be enabled, while paragraphs also require html_strip=1.

Let's take the following example:

select * from testrt where match('"the brown fox" jumps')\G

The document includes 2 sentences, while the phrase is found in the first one, the 2nd argument 'jumps' is only found in the second sentence. With SENTENCE we can restrict the search to have matching only if the operands are in the same sentence:

select * from testrt where match('"the brown fox" SENTENCE jumps')\G

We can see that the document is not a match any more.

If we correct the search query so all the words are from the same sentence, we'll see a match: select * from testrt where match('"the brown fox" SENTENCE back')\G

To demonstrate the PARAGRAPH operator, let's use the following search:

select * from testrt where match('Samsung Galaxy');

These 2 documents have the words inside different HTML tags.

select * from testrt where match('Samsung PARAGRAPH Galaxy');

If we add PARAGRAPH, only the document with the search terms found in a single tag will be matched.

A more general operator is ZONE and it's variant ZONESPAN. A zone is text contained between an HTML or XML tag. The tags to be considered for zones need to be declared in index_zones setting, like 'index_zones = h*, th, title'. For example:

select * from testrt where match('hello world');

We have 3 documents, where 'hello' and 'world' are found in plain text, in different zones of the same type or in a single zone.

select * from testrt where match('ZONE:h1 hello world');

In this case, the words are present in H1 zones, but it's not required for them to be in the same zone. If we want to limit the match to a single zone, we can use ZONESPAN:

select * from testrt where match('ZONESPAN:h1 hello world');