Terminal Terminal | Web Web
Home  //  Play

Distributed tables

Difficulty: Beginner
Estimated Time: 15 minutes

Distributed Tables in Manticore Search

In this course, you will learn how to create and use distributed tables for scaling searches.

What you'll learn:

  • Understand distributed table concepts
  • Create distributed tables with local tables
  • Configure remote agents for cross-server search
  • Set up high availability with agent mirrors
  • Handle timeouts and retry settings

Distributed tables enable searching across multiple tables and servers with a single query.

Distributed tables

Step 1 of 5

Understanding Distributed Tables

Let's connect to Manticore and explore distributed tables.

mysql -P9306 -h0

A distributed table is a virtual table that:

  • Contains no data itself
  • Routes queries to local or remote tables
  • Merges results from multiple sources
  • Provides single query interface for multiple tables

First, create some local tables to distribute:

CREATE TABLE products_us(title text, price float, region string);

CREATE TABLE products_eu(title text, price float, region string);

Insert data into each:

INSERT INTO products_us(title, price, region) VALUES ('Laptop Pro', 999.99, 'US'), ('Wireless Mouse', 29.99, 'US'), ('Monitor 4K', 399.99, 'US');

INSERT INTO products_eu(title, price, region) VALUES ('Laptop Standard', 899.99, 'EU'), ('Keyboard Wireless', 49.99, 'EU'), ('Webcam HD', 79.99, 'EU');

Verify data:

SELECT * FROM products_us;

SELECT * FROM products_eu;

Creating Local Distributed Tables

Create a distributed table combining local tables:

CREATE TABLE products_all type='distributed' local='products_us' local='products_eu';

Now query all products with one query:

SELECT * FROM products_all;

The distributed table returns results from both underlying tables.

Full-text search across all tables:

SELECT * FROM products_all WHERE MATCH('laptop');

Aggregations work across distributed tables:

SELECT region, COUNT(*) as cnt FROM products_all GROUP BY region;

SELECT AVG(price) as avg_price FROM products_all;

Filtering:

SELECT * FROM products_all WHERE price < 100;

The distributed table provides a unified view of data spread across multiple tables.

View table structure:

SHOW CREATE TABLE products_all;

Add more local tables to existing distributed table:

CREATE TABLE products_asia(title text, price float, region string);

INSERT INTO products_asia(title, price, region) VALUES ('Tablet Pro', 599.99, 'Asia'), ('Earbuds Wireless', 149.99, 'Asia');

Remote Agents

Distributed tables can query remote Manticore servers using agents.

Agent syntax: agent = hostname:port:table_name

Example distributed table with remote agents (config file):

table products_global {
    type = distributed
    local = products_local
    agent = server1:9312:products
    agent = server2:9312:products
}

Note: Remote agents must be configured in the Manticore config file - they cannot be created via SQL CREATE TABLE. The SQL method only supports local tables.

Agent settings control connection behavior (config file):

table products_dist {
    type = distributed
    agent = server1:9312:products
    agent_connect_timeout = 1000
    agent_query_timeout = 3000
    agent_retry_count = 2
}

Key timeout settings:

  • agent_connect_timeout: connection establishment (ms)
  • agent_query_timeout: query execution timeout (ms)
  • agent_retry_count: retry attempts on failure

For this demo, let's create a distributed table combining all regions:

CREATE TABLE dist_demo type='distributed' local='products_us' local='products_eu' local='products_asia';

Query all regions with one statement:

SELECT * FROM dist_demo;

View the distributed table structure:

SHOW CREATE TABLE dist_demo;

High Availability with Agent Mirrors

Agent mirrors provide failover - if one server fails, queries go to the mirror.

Mirror syntax (config file):

table products_ha {
    type = distributed
    agent = server1:9312|server1_backup:9312:products
    ha_strategy = roundrobin
}

The pipe (|) separates primary and mirror agents.

HA strategies:

  • random: randomly select from mirrors
  • roundrobin: cycle through mirrors
  • nodeads: skip known dead mirrors
  • noerrors: skip mirrors with recent errors

Example with multiple mirrors:

table products_failover {
    type = distributed
    agent = srv1:9312|srv2:9312|srv3:9312:products
    ha_strategy = noerrors
    agent_retry_count = 3
}

Persistent connections improve performance:

table products_persistent {
    type = distributed
    agent = server1:9312:products
    agent_persistent = 1
}

Check agent status:

SHOW AGENT STATUS;

This shows connection state and error counts for remote agents.

Let's verify the distributed table works correctly by querying data from all regions:

SELECT region, COUNT(*) as cnt FROM dist_demo GROUP BY region;

This query demonstrates that distributed tables support aggregations (GROUP BY, COUNT) across multiple local tables, returning combined results from all underlying tables.

Practical Patterns and Best Practices

Let's explore common distributed table patterns.

Sharding by region (data partitioning):

SELECT * FROM dist_demo WHERE region = 'US';

Each query only hits relevant shards if you filter by shard key.

Load balancing read queries:

Create multiple replicas and distribute reads:

table products_lb {
    type = distributed
    agent = replica1:9312|replica2:9312|replica3:9312:products
    ha_strategy = roundrobin
}

Time-based sharding (logs example):

table logs_all {
    type = distributed
    local = logs_2024_01
    local = logs_2024_02
    local = logs_2024_03
}

Check distributed table configuration:

SHOW CREATE TABLE dist_demo;

SHOW TABLES;

Monitor query distribution:

SHOW STATUS LIKE 'dist%';

Best practices:

  • Use local tables when possible (faster)
  • Set appropriate timeouts for remote agents
  • Use mirrors for high availability
  • Monitor agent status regularly
  • Consider data locality when sharding
  • Use persistent connections for frequent queries

Clean up demo tables:

DROP TABLE dist_demo;

DROP TABLE products_all;