Terminal Terminal | Web Web
Home  //  Play

Ukranian lemmatization

Difficulty: Beginner
Estimated Time: 5 minutes

Manticoresearch - Ukranian lemmatizer

In this tutorial you will learn how to enable the lemmatizer for Ukranian langiage in Manticore and see examples of its usage.

Ukranian lemmatization

Step 1 of 3

Step 1

Here is a link to the full installation instruction:

https://manual.manticoresearch.com/Installation/Debian_and_Ubuntu#Ukrainian-lemmatizer

Make a note that we has prepared some time-consuming steps from it in advance, and thus, will skip them below.

Install Manticore Search and UK lemmatizer

Firstly, we install Manticore with the Ukranian lemmatizer.

cd ~
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
dpkg -i manticore-repo.noarch.deb
apt-key adv --fetch-keys http://repo.manticoresearch.com/GPG-KEY-manticore
apt -y update
apt -y install manticore manticore-lemmatizer-uk

Ukrainian lemmatization is often used together with Russian lemmatization which requires a morphology pack. To download it run:

wget https://repo.manticoresearch.com/repository/morphology/ru.pak.tgz

Then extract it to the Manticore's 'share' folder:

mkdir /usr/share/manticore/morphology/ && tar xzf ru.pak.tgz -C /usr/share/manticore/morphology/

Also, make sure you have 'lemmatizer_base = /usr/share/manticore/morphology/' option set in the 'common' section of your configuration file:

cat /etc/manticoresearch/manticore.conf

Everything is okay, and now we can move to the next step.

Step 2

Installing Python environment

Secondly, we install Python and all its required modules.

To be able to install Python we must install some libraries it requires.

apt -y update
apt -y install wget build-essential libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev libffi-dev zlib1g-dev

Now we can instal Python itself

wget https://www.python.org/ftp/python/3.9.4/Python-3.9.4.tgz
tar xzf Python-3.9.4.tgz

cd Python-3.9.4

./configure --enable-optimizations --enable-shared
make -j8
make altinstall

Let's update the linker cache:

ldconfig

Finally, we install the pymorphy2 module and the Ukranian dictionary for it:

pip3.9 install pymorphy2[fast]

pip3.9 install pymorphy2-dicts-uk

Usage

Now we can connect to Manticore Search and perform a couple of simple queries demonstrating the lemmatizer's work.

mysql -P9306 -h0

Let's create a simple table that supports Ukranian morphology:

create table tbl (f text) charset_table='non_cjk,U+0406->U+0456,U+0456,U+0407->U+0457,U+0457,U+0490->U+0491,U+0491' morphology='lemmatize_uk_all, lemmatize_ru_all';

And will look at the lemmatized forms of some ukranian words:

call keywords('у галицькій літературі компромісів грандіозні їжа мрії ґніт ґніти ґудзики ґатунки ґанки ґрати ґулі дзиґи червона красный красная красное красные', 'tbl');

Now lets' do a simple INSERT query:

insert into tbl values(0,'У нашому місті-порту знаходиться красивий майяк');

And perform a SELECT query on this table using a lemmatized wordform:

select highlight() from tbl where match('місто-порт');

As we see, it successfully highlighted the original word from our sentence.