Home  //  Play

Manticore Introduction in full-text search operators

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - Introduction in full-text operators

In this tutorial we explode full-text search operators available in Manticore Search.

Manticore Introduction in full-text search operators

Step 1 of 5

Basics

Let's connect to Manticore using the mysql client:

mysql -P9306 -h0

The most simple full-text query in Manticore is an enumeration of words. The SQL statement used for the definining a full-text match is 'MATCH()', which has a single parameter - the query string.

By comparison, in MYSQL for example, the syntax is 'MATCH(list_of_columns) AGAINST(query_string)':

SELECT * FROM ftsindex WHERE MATCH('title,content') AGAINST('find me fast');

The implicit operator is 'AND' and by default the words will be searched in all full-text fields available:

SELECT * FROM testrt WHERE MATCH('find me fast');

For boolean searches the OR ( '|') can be used:

select * from testrt where match('find | me fast');

The OR operator has higher precende than AND, the query 'find me fast|slow' is translated as 'find me (fast|slow)':

select * from testrt where match('find me fast|slow');

For negations, operator NOT can be specified with '-' or '!' :

select * from testrt where match('find me !fast');

It must be noted that full negation queries are not supported and it's not possible to run just '!fast'.

Another base operator is MAYBE. The term defined by MAYBE can be present or not in the documents. If it's present, it will influence the ranking and documents having it will be ranked higher

select * from testrt where match('find me MAYBE slow');

Basic operators - fields

If we want to limit to search to a field, the operator '@' can be used:

select * from testrt where match('@title find me fast');

We can also specify multiple fields to limit the search:

select * from testrt where match('@(title,content) find me fast');

The field operator can also be used to perform a restriction for the search to be made only in the first x words. For example:

select * from testrt where match('@title lazy dog');

However:

select * from testrt where match('@title[5] lazy dog');

In some situations search could be performed over multiple indexes that may not have same full-text fields. Obvious specifying a field that doesn't exist will result in a query error. To overcome this, special operator '@@relaxed' can be used:

select * from testrt where match('@(title,keywords) lazy dog');

select * from testrt where match('@@relaxed @(title,keywords) lazy dog');

Fuzzy search

Fuzzy matching allows to match only some of the words from the query string

select * from testrt where match('"fox bird lazy dog"/3');

In this case we specify that is fine to match only 3 of the words. A fuzzy of '/1' is equivalent of an OR boolean search, which fuzzy of '/N' where N is the number of input words is equivalent of an AND search.

Instead of specifying number of words we can use instead a fraction between 0.0 and 1.0. This can be

Advanced operators

One of the most used operators is the phrase operator. The phrase operator will match only if the given words are found in the verbatim specified order. This will also restrict the words to be found in the same field:

SELECT * FROM testrt WHERE MATCH('"find fast me"');

A more relaxed version of phrase operator is the strict order operator. The order operator requires only the word to be found in the same order speficied, but order words are accepted between:

SELECT * FROM testrt WHERE MATCH('find << me << fast');

Another pair of operators which work with positions are the start/end field operators. These will restrict a word to be present at start or end of a field.

SELECT * FROM testrt WHERE MATCH('^find me fast$');

Proximity operator is an AND operator that adds a maximum distance in which the words should be found.Let's take this example:

select * from testrt where match('brown fox jumps');

Our query returns 2 results, one in which all words are close to each other and a second one where one of the word is more distant. If we want to match only if the words are within a certain distance, we can restrict that with the proximity operator:

mysql> select * from testrt where match('"brown fox jumps"~5');

A more generalized version of proximity is the NEAR operator. In case of proximity a single distance is specified over a bag of words. NEAR works with 2 operands, which can be either single words or expressions.

In the following example, 'brown' and 'fox' must be withing a distance of 2 and 'fox' and 'jumps' within a distance of 6:

select * from testrt where match('brown NEAR/2 fox NEAR/6 jumps');

The query lefts out one document which doesn't match the first NEAR condition.

select * from testrt where match('brown NEAR/3 fox NEAR/6 jumps');

A variation of NEAR is NOTNEAR, which matches only of the operands have a minimum distance between them.

select * from testrt where match('"brown fox" NOTNEAR/5 jumps');

Advanced operators

Manticore can detect sentences over plain texts and paragraphs in HTML content. For sentences the index_sp option needs be enabled, while paragraphs also require html_strip=1

Let's take the following example:

select * from testrt where match('"the quick brown fox" jumps');

The first document include 2 sentences, while the phrase is found in the first one while 'jumps' is in the second sentence. With SENTENCE we can restrict the search to have matching only if the operands are in the same sentence:

select * from testrt where match('"the quick brown fox" SENTENCE jumps');

In case of PARAGRAPH let's use the following search:

select * from testrt where match('Samsung Galaxy');

The 2 documents have the words inside different HTML tags.

select * from testrt where match('Samsung PARAGRAPH Galaxy');

If we add PARAGRAPH only the document with the search terms found in a single tag.

A more general operator is ZONE and it's variant ZONESPAN. A zone is text contained between a HTML or XML tag. The tags to be considered for zones needs to be declared in 'index_zones' setting, like 'index_zones = h*, th, title'. For example:

select * from testrt where match('hello world');

We have 3 documents, where 'hello' and 'world' are found in plain text, in different zones of the same type or in a single zone.

select * from testrt where match('ZONE:h1 hello world');

In this case, the words are present in H1 zones, but it's not required to be in the same zone. If we want to limit the match to a single zone, we can use ZONESPAN:

select * from testrt where match('ZONESPAN:h1 hello world');