Index Settings Optimization

Step 1 of 5

Morphology Settings

Let's connect to Manticore and explore text processing settings.

mysql -P9306 -h0

Create a table without morphology:

DROP TABLE IF EXISTS products_basic;

CREATE TABLE products_basic(title text, description text);

Create a table with English stemming:

DROP TABLE IF EXISTS products_stem;

CREATE TABLE products_stem(title text, description text) morphology='stem_en';

Insert same data into both:

INSERT INTO products_basic(title, description) VALUES
('Running Shoes', 'Lightweight shoes for runners and running enthusiasts'),
('Programming Books', 'Books about programming and programmers');

INSERT INTO products_stem(title, description) VALUES
('Running Shoes', 'Lightweight shoes for runners and running enthusiasts'),
('Programming Books', 'Books about programming and programmers');

Compare search results:

SELECT * FROM products_basic WHERE MATCH('run');

SELECT * FROM products_stem WHERE MATCH('run');

With morphology, "run" matches "running" and "runners".

Available morphology options:

stem_en, stem_ru, stem_de... (language stemmers)
lemmatize_en, lemmatize_ru... (lemmatizers)
soundex, metaphone (phonetic)

SHOW TABLE products_stem SETTINGS;

Stopwords and Word Length

Stopwords are common words excluded from indexing to save space.

Create a table with stopwords:

DROP TABLE IF EXISTS articles;

CREATE TABLE articles(title text, content text) stopwords='en' min_word_len='3';

Insert content:

INSERT INTO articles(title, content) VALUES
('The Art of Programming', 'This is a comprehensive guide to the art of programming in various languages'),
('Introduction to AI', 'An introduction to artificial intelligence and machine learning');

Search for stopwords (they won't match):

SELECT * FROM articles WHERE MATCH('the');

SELECT * FROM articles WHERE MATCH('is');

But regular words work:

SELECT * FROM articles WHERE MATCH('programming');

min_word_len='3' skips words shorter than 3 characters:

SELECT * FROM articles WHERE MATCH('AI');

Short words like "AI" (2 chars) are not indexed.

View settings:

SHOW TABLE articles SETTINGS;

Create table with custom min_word_len:

DROP TABLE IF EXISTS logs;

CREATE TABLE logs(message text) min_word_len='2';

Now 2-character words are indexed. Let's verify:

INSERT INTO logs(message) VALUES('IP address 10.0.0.1 is blocked');

SELECT * FROM logs WHERE MATCH('IP');

With min_word_len='2', the 2-character word "IP" is indexed and searchable. Compare this with the articles table where min_word_len='3' and "AI" was not found.

HTML Stripping

For web content, HTML stripping removes tags before indexing.

Create a table with HTML stripping:

DROP TABLE IF EXISTS webpages;

CREATE TABLE webpages(title text, body text) html_strip='1';

Insert HTML content:

INSERT INTO webpages(title, body) VALUES
('Sample Page', '<html><body><h1>Welcome</h1><p>This is a <strong>sample</strong> paragraph with <a href="link">links</a>.</p></body></html>'),
('Another Page', '<div class="content"><p>Text with <em>emphasis</em> and <code>code</code> blocks.</p></div>');

Search finds text without HTML tags. Phrase search proves tags are fully removed — in the original HTML, <strong> sits between "sample" and "paragraph", but after stripping they are adjacent:

SELECT * FROM webpages WHERE MATCH('welcome');

SELECT * FROM webpages WHERE MATCH('"sample paragraph"');

The phrase search works because HTML tags are stripped before indexing, keeping the words next to each other.

For more control, use html_remove_elements to exclude specific tags:

DROP TABLE IF EXISTS docs;

CREATE TABLE docs(title text, content text) html_strip='1' html_remove_elements='script,style';

This removes script and style blocks entirely (not just their tags).

Index specific HTML attributes:

DROP TABLE IF EXISTS pages;

CREATE TABLE pages(title text, body text) html_strip='1' html_index_attrs='img=alt,title';

This indexes alt and title attributes from img tags.

Memory Settings for Real-Time Tables

Real-time tables have memory settings that affect performance.

Create a table with custom RAM limit:

DROP TABLE IF EXISTS events;

CREATE TABLE events(title text, data text, event_time bigint) rt_mem_limit='64M';

rt_mem_limit controls when RAM chunk flushes to disk (default 128M).

Insert data:

INSERT INTO events(title, data, event_time) VALUES
('Event 1', 'First event data', 1704067200),
('Event 2', 'Second event data', 1704153600),
('Event 3', 'Third event data', 1704240000);

Check table status:

SHOW TABLE events STATUS;

Key fields to notice:

ram_bytes — how much RAM the table currently uses for the RAM chunk
disk_chunks — number of disk segments (0 means all data is still in RAM)
ram_chunk_segments_count — segments in the RAM chunk

With only a few inserts, data stays in RAM. When ram_bytes exceeds rt_mem_limit, the RAM chunk flushes to disk, creating a new disk_chunk.

For high-write workloads, increase rt_mem_limit:

DROP TABLE IF EXISTS logs_high;

CREATE TABLE logs_high(message text, level string) rt_mem_limit='256M';

Larger RAM chunks mean:

Fewer disk flushes
Better write throughput
More memory usage

For low-memory systems, decrease it:

DROP TABLE IF EXISTS logs_low;

CREATE TABLE logs_low(message text, level string) rt_mem_limit='32M';

View all table settings:

SHOW TABLE events SETTINGS;

Compare rt_mem_limit across tables:

SHOW TABLE logs_high SETTINGS;

SHOW TABLE logs_low SETTINGS;

Storage Engine Selection

Choose between row-wise and columnar storage based on workload.

Row-wise (default) - better for full document retrieval:

DROP TABLE IF EXISTS products_row;

CREATE TABLE products_row(name text, price float, stock int) engine='rowwise';

Columnar - better for analytics and aggregations:

DROP TABLE IF EXISTS products_col;

CREATE TABLE products_col(name text, price float, stock int) engine='columnar';

Insert identical data:

INSERT INTO products_row(name, price, stock) VALUES ('Item A', 99.99, 100), ('Item B', 149.99, 50);

INSERT INTO products_col(name, price, stock) VALUES ('Item A', 99.99, 100), ('Item B', 149.99, 50);

Both return the same results, but with large datasets columnar storage is more efficient for aggregations because it reads only the needed columns:

SELECT AVG(price), SUM(stock) FROM products_row;

SELECT AVG(price), SUM(stock) FROM products_col;

Check storage details — compare storage_type in the output:

SHOW TABLE products_row STATUS;

SHOW TABLE products_col STATUS;

Mixed approach - specific columns columnar:

DROP TABLE IF EXISTS hybrid;

CREATE TABLE hybrid(name text, category string, metrics float engine='columnar');

Here only the metrics attribute uses columnar storage, while category stays row-wise.

DESCRIBE hybrid;

In the Properties column, you should see columnar next to metrics, while category has no such property — it stays row-wise.

Guidelines:

Use rowwise for OLTP (fetch whole records)
Use columnar for OLAP (aggregations, few columns)
Use columnar for large datasets with memory constraints
Use per-attribute engine='columnar' for selective optimization

Index Settings Optimization

Index Settings Optimization

Index Settings Optimization

Congratulations! You've completed the course!

Congratulations!

>>