>>
Index Settings Optimization
Morphology Settings
Let's connect to Manticore and explore text processing settings.
mysql -P9306 -h0
Create a table without morphology:
DROP TABLE IF EXISTS products_basic;
CREATE TABLE products_basic(title text, description text);
Create a table with English stemming:
DROP TABLE IF EXISTS products_stem;
CREATE TABLE products_stem(title text, description text) morphology='stem_en';
Insert same data into both:
INSERT INTO products_basic(title, description) VALUES
('Running Shoes', 'Lightweight shoes for runners and running enthusiasts'),
('Programming Books', 'Books about programming and programmers');
INSERT INTO products_stem(title, description) VALUES
('Running Shoes', 'Lightweight shoes for runners and running enthusiasts'),
('Programming Books', 'Books about programming and programmers');
Compare search results:
SELECT * FROM products_basic WHERE MATCH('run');
SELECT * FROM products_stem WHERE MATCH('run');
With morphology, "run" matches "running" and "runners".
Available morphology options:
- stem_en, stem_ru, stem_de... (language stemmers)
- lemmatize_en, lemmatize_ru... (lemmatizers)
- soundex, metaphone (phonetic)
SHOW TABLE products_stem SETTINGS;
Stopwords and Word Length
Stopwords are common words excluded from indexing to save space.
Create a table with stopwords:
DROP TABLE IF EXISTS articles;
CREATE TABLE articles(title text, content text) stopwords='en' min_word_len='3';
Insert content:
INSERT INTO articles(title, content) VALUES
('The Art of Programming', 'This is a comprehensive guide to the art of programming in various languages'),
('Introduction to AI', 'An introduction to artificial intelligence and machine learning');
Search for stopwords (they won't match):
SELECT * FROM articles WHERE MATCH('the');
SELECT * FROM articles WHERE MATCH('is');
But regular words work:
SELECT * FROM articles WHERE MATCH('programming');
min_word_len='3' skips words shorter than 3 characters:
SELECT * FROM articles WHERE MATCH('AI');
Short words like "AI" (2 chars) are not indexed.
View settings:
SHOW TABLE articles SETTINGS;
Create table with custom min_word_len:
DROP TABLE IF EXISTS logs;
CREATE TABLE logs(message text) min_word_len='2';
Now 2-character words are indexed. Let's verify:
INSERT INTO logs(message) VALUES('IP address 10.0.0.1 is blocked');
SELECT * FROM logs WHERE MATCH('IP');
With min_word_len='2', the 2-character word "IP" is indexed and searchable. Compare this with the articles table where min_word_len='3' and "AI" was not found.
HTML Stripping
For web content, HTML stripping removes tags before indexing.
Create a table with HTML stripping:
DROP TABLE IF EXISTS webpages;
CREATE TABLE webpages(title text, body text) html_strip='1';
Insert HTML content:
INSERT INTO webpages(title, body) VALUES
('Sample Page', '<html><body><h1>Welcome</h1><p>This is a <strong>sample</strong> paragraph with <a href="link">links</a>.</p></body></html>'),
('Another Page', '<div class="content"><p>Text with <em>emphasis</em> and <code>code</code> blocks.</p></div>');
Search finds text without HTML tags. Phrase search proves tags are fully removed — in the original HTML, <strong> sits between "sample" and "paragraph", but after stripping they are adjacent:
SELECT * FROM webpages WHERE MATCH('welcome');
SELECT * FROM webpages WHERE MATCH('"sample paragraph"');
The phrase search works because HTML tags are stripped before indexing, keeping the words next to each other.
For more control, use html_remove_elements to exclude specific tags:
DROP TABLE IF EXISTS docs;
CREATE TABLE docs(title text, content text) html_strip='1' html_remove_elements='script,style';
This removes script and style blocks entirely (not just their tags).
Index specific HTML attributes:
DROP TABLE IF EXISTS pages;
CREATE TABLE pages(title text, body text) html_strip='1' html_index_attrs='img=alt,title';
This indexes alt and title attributes from img tags.
Memory Settings for Real-Time Tables
Real-time tables have memory settings that affect performance.
Create a table with custom RAM limit:
DROP TABLE IF EXISTS events;
CREATE TABLE events(title text, data text, event_time bigint) rt_mem_limit='64M';
rt_mem_limit controls when RAM chunk flushes to disk (default 128M).
Insert data:
INSERT INTO events(title, data, event_time) VALUES
('Event 1', 'First event data', 1704067200),
('Event 2', 'Second event data', 1704153600),
('Event 3', 'Third event data', 1704240000);
Check table status:
SHOW TABLE events STATUS;
Key fields to notice:
- ram_bytes — how much RAM the table currently uses for the RAM chunk
- disk_chunks — number of disk segments (0 means all data is still in RAM)
- ram_chunk_segments_count — segments in the RAM chunk
With only a few inserts, data stays in RAM. When ram_bytes exceeds rt_mem_limit, the RAM chunk flushes to disk, creating a new disk_chunk.
For high-write workloads, increase rt_mem_limit:
DROP TABLE IF EXISTS logs_high;
CREATE TABLE logs_high(message text, level string) rt_mem_limit='256M';
Larger RAM chunks mean:
- Fewer disk flushes
- Better write throughput
- More memory usage
For low-memory systems, decrease it:
DROP TABLE IF EXISTS logs_low;
CREATE TABLE logs_low(message text, level string) rt_mem_limit='32M';
View all table settings:
SHOW TABLE events SETTINGS;
Compare rt_mem_limit across tables:
SHOW TABLE logs_high SETTINGS;
SHOW TABLE logs_low SETTINGS;
Storage Engine Selection
Choose between row-wise and columnar storage based on workload.
Row-wise (default) - better for full document retrieval:
DROP TABLE IF EXISTS products_row;
CREATE TABLE products_row(name text, price float, stock int) engine='rowwise';
Columnar - better for analytics and aggregations:
DROP TABLE IF EXISTS products_col;
CREATE TABLE products_col(name text, price float, stock int) engine='columnar';
Insert identical data:
INSERT INTO products_row(name, price, stock) VALUES ('Item A', 99.99, 100), ('Item B', 149.99, 50);
INSERT INTO products_col(name, price, stock) VALUES ('Item A', 99.99, 100), ('Item B', 149.99, 50);
Both return the same results, but with large datasets columnar storage is more efficient for aggregations because it reads only the needed columns:
SELECT AVG(price), SUM(stock) FROM products_row;
SELECT AVG(price), SUM(stock) FROM products_col;
Check storage details — compare storage_type in the output:
SHOW TABLE products_row STATUS;
SHOW TABLE products_col STATUS;
Mixed approach - specific columns columnar:
DROP TABLE IF EXISTS hybrid;
CREATE TABLE hybrid(name text, category string, metrics float engine='columnar');
Here only the metrics attribute uses columnar storage, while category stays row-wise.
DESCRIBE hybrid;
In the Properties column, you should see columnar next to metrics, while category has no such property — it stays row-wise.
Guidelines:
- Use rowwise for OLTP (fetch whole records)
- Use columnar for OLAP (aggregations, few columns)
- Use columnar for large datasets with memory constraints
- Use per-attribute engine='columnar' for selective optimization