>>
Ukranian lemmatization
Step 1
Here is a link to the full installation instruction:
https://manual.manticoresearch.com/Installation/Debian_and_Ubuntu#Ukrainian-lemmatizer
Make a note that we has prepared some time-consuming steps from it in advance, and thus, will skip them below.
Install Manticore Search and UK lemmatizer
Firstly, we install Manticore with the Ukranian lemmatizer.
cd ~
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
dpkg -i manticore-repo.noarch.deb
apt-key adv --fetch-keys http://repo.manticoresearch.com/GPG-KEY-manticore
apt -y update
apt -y install manticore manticore-lemmatizer-uk
Ukrainian lemmatization is often used together with Russian lemmatization which requires a morphology pack. To download it run:
wget https://repo.manticoresearch.com/repository/morphology/ru.pak.tgz
Then extract it to the Manticore's 'share' folder:
mkdir /usr/share/manticore/morphology/ && tar xzf ru.pak.tgz -C /usr/share/manticore/morphology/
Also, make sure you have 'lemmatizer_base = /usr/share/manticore/morphology/' option set in the 'common' section of your configuration file:
cat /etc/manticoresearch/manticore.conf
Everything is okay, and now we can move to the next step.
Step 2
Installing Python environment
Secondly, we install Python and all its required modules.
To be able to install Python we must install some libraries it requires.
apt -y update
apt -y install wget build-essential libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev libffi-dev zlib1g-dev
Now we can instal Python itself
wget https://www.python.org/ftp/python/3.9.4/Python-3.9.4.tgz
tar xzf Python-3.9.4.tgz
cd Python-3.9.4
./configure --enable-optimizations --enable-shared
make -j8
make altinstall
Let's update the linker cache:
ldconfig
Finally, we install the pymorphy2 module and the Ukranian dictionary for it:
pip3.9 install pymorphy2[fast]
pip3.9 install pymorphy2-dicts-uk
Usage
Now we can connect to Manticore Search and perform a couple of simple queries demonstrating the lemmatizer's work.
mysql -P9306 -h0
Let's create a simple table that supports Ukranian morphology:
create table tbl (f text) charset_table='non_cjk,U+0406->U+0456,U+0456,U+0407->U+0457,U+0457,U+0490->U+0491,U+0491' morphology='lemmatize_uk_all, lemmatize_ru_all';
And will look at the lemmatized forms of some ukranian words:
call keywords('у галицькій літературі компромісів грандіозні їжа мрії ґніт ґніти ґудзики ґатунки ґанки ґрати ґулі дзиґи червона красный красная красное красные', 'tbl');
Now lets' do a simple INSERT query:
insert into tbl values(0,'У нашому місті-порту знаходиться красивий майяк');
And perform a SELECT query on this table using a lemmatized wordform:
select highlight() from tbl where match('місто-порт');
As we see, it successfully highlighted the original word from our sentence.