OPTIMIZE and Index Merging

Step 1 of 5

Understanding Disk Chunks

Let's connect to Manticore and understand how real-time tables store data.

mysql -P9306 -h0

Real-time tables have two storage components:

RAM chunk: holds recent data in memory
Disk chunks: flushed data stored on disk

Create a table and add data:

DROP TABLE IF EXISTS articles;

CREATE TABLE articles(title text, content text, views int);

Insert some documents:

INSERT INTO articles(title, content, views) VALUES
('First Article', 'This is the first article content', 100),
('Second Article', 'This is the second article content', 200),
('Third Article', 'This is the third article content', 150);

Check table status to see chunks:

SHOW TABLE articles STATUS;

Look for:

ram_chunk: current memory buffer
disk_chunks: number of disk segments

Force flush RAM to disk:

FLUSH RAMCHUNK articles;

Check status again:

SHOW TABLE articles STATUS;

Now you should see disk chunks. Each flush creates a new disk chunk.

Creating Multiple Disk Chunks

Let's create multiple disk chunks to see why optimization matters.

Insert more data and flush multiple times:

INSERT INTO articles(title, content, views) VALUES
('Article Four', 'Content for article four', 250),
('Article Five', 'Content for article five', 300);

FLUSH RAMCHUNK articles;

INSERT INTO articles(title, content, views) VALUES
('Article Six', 'Content for article six', 175),
('Article Seven', 'Content for article seven', 225);

FLUSH RAMCHUNK articles;

INSERT INTO articles(title, content, views) VALUES
('Article Eight', 'Content for article eight', 400),
('Article Nine', 'Content for article nine', 125);

FLUSH RAMCHUNK articles;

Check disk chunks:

SHOW TABLE articles STATUS;

You should see multiple disk_chunks now. Too many chunks slow down queries because Manticore must search each one.

Delete some documents:

DELETE FROM articles WHERE views < 150;

Deleted documents still occupy space until optimization merges and cleans the chunks.

SHOW TABLE articles STATUS;

Using OPTIMIZE TABLE

OPTIMIZE TABLE merges disk chunks and removes deleted documents.

The basic syntax is:

OPTIMIZE TABLE table_name [OPTION cutoff=N, sync=1];

By default, OPTIMIZE merges chunks until their count is below optimize_cutoff (default: CPU cores * 2). Since our table has only 4 chunks which is already below the threshold, we need to specify cutoff explicitly.

Let's merge to 2 chunks:

OPTIMIZE TABLE articles OPTION cutoff=2, sync=1;

The sync=1 option makes the command wait until optimization completes before returning.

SHOW TABLE articles STATUS;

Notice that disk_chunks dropped from 4 to 2, killed_documents is now 0, and disk_bytes decreased.

For complete merge into a single chunk:

OPTIMIZE TABLE articles OPTION cutoff=1, sync=1;

SHOW TABLE articles STATUS;

Now all data is in one disk chunk. Both deleted documents and redundant chunk overhead have been removed.

Auto-Optimization Settings

Manticore 4+ includes automatic optimization. Let's understand the settings.

Check current server settings:

SHOW VARIABLES LIKE 'auto_optimize';

Auto-optimize runs in background based on thresholds.

Create a new table with custom optimize_cutoff:

DROP TABLE IF EXISTS logs;

CREATE TABLE logs(message text, level string, code int) optimize_cutoff='4';

This table will auto-optimize to maintain maximum 4 disk chunks.

Insert and flush multiple times to trigger auto-optimize:

INSERT INTO logs(message, level, code) VALUES ('Log entry 1', 'info', 200);

FLUSH RAMCHUNK logs;

INSERT INTO logs(message, level, code) VALUES ('Log entry 2', 'warning', 300);

FLUSH RAMCHUNK logs;

INSERT INTO logs(message, level, code) VALUES ('Log entry 3', 'error', 500);

FLUSH RAMCHUNK logs;

SHOW TABLE logs STATUS;

You can also set optimize_cutoff at server level in manticore.conf:

searchd {
    auto_optimize = 1
    optimize_cutoff = 4
}

View ongoing optimization:

SHOW THREADS;

Performance Tuning

Let's explore optimization-related settings and monitoring.

Check optimization settings:

SHOW VARIABLES LIKE '%optim%';

auto_optimize: enables automatic background optimization (default: 1)
optimize_cutoff: target number of disk chunks (default: CPU cores * 2)

For I/O throttling, Manticore provides config-only directives in manticore.conf:

searchd {
    rt_merge_iops = 40
    rt_merge_maxiosize = 1M
}

These prevent optimization from overwhelming disk I/O on busy systems:

rt_merge_iops: max I/O operations per second during merge
rt_merge_maxiosize: max size of each I/O operation

Monitor optimization progress:

SHOW THREADS OPTION format=all;

Look for threads with "optimizing" state.

Create another table to demonstrate:

DROP TABLE IF EXISTS events;

CREATE TABLE events(title text, category string, priority int);

INSERT INTO events(title, category, priority) VALUES
('Event A', 'system', 1), ('Event B', 'user', 2), ('Event C', 'system', 3);

FLUSH RAMCHUNK events;

INSERT INTO events(title, category, priority) VALUES
('Event D', 'network', 2), ('Event E', 'user', 1), ('Event F', 'system', 2);

FLUSH RAMCHUNK events;

Optimize to a single chunk:

OPTIMIZE TABLE events OPTION cutoff=1, sync=1;

SHOW TABLE events STATUS;

Notice that disk_chunks is now 1. Without cutoff, OPTIMIZE uses the default threshold (CPU cores * 2), so if the table already has fewer chunks it won't merge further.

Best practices:

Let auto_optimize handle routine maintenance
Use manual OPTIMIZE for major cleanups
Set cutoff based on CPU cores (default: cores * 2)
Monitor with SHOW TABLE STATUS and SHOW THREADS

OPTIMIZE and Index Merging

OPTIMIZE and Index Merging

OPTIMIZE and Index Merging

Congratulations! You've completed the course!

>>