>>
Distributed tables
Understanding Distributed Tables
Let's connect to Manticore and explore distributed tables.
mysql -P9306 -h0
A distributed table is a virtual table that:
- Contains no data itself
- Routes queries to local or remote tables
- Merges results from multiple sources
- Provides single query interface for multiple tables
First, create some local tables to distribute:
CREATE TABLE products_us(title text, price float, region string);
CREATE TABLE products_eu(title text, price float, region string);
Insert data into each:
INSERT INTO products_us(title, price, region) VALUES ('Laptop Pro', 999.99, 'US'), ('Wireless Mouse', 29.99, 'US'), ('Monitor 4K', 399.99, 'US');
INSERT INTO products_eu(title, price, region) VALUES ('Laptop Standard', 899.99, 'EU'), ('Keyboard Wireless', 49.99, 'EU'), ('Webcam HD', 79.99, 'EU');
Verify data:
SELECT * FROM products_us;
SELECT * FROM products_eu;
Creating Local Distributed Tables
Create a distributed table combining local tables:
CREATE TABLE products_all type='distributed' local='products_us' local='products_eu';
Now query all products with one query:
SELECT * FROM products_all;
The distributed table returns results from both underlying tables.
Full-text search across all tables:
SELECT * FROM products_all WHERE MATCH('laptop');
Aggregations work across distributed tables:
SELECT region, COUNT(*) as cnt FROM products_all GROUP BY region;
SELECT AVG(price) as avg_price FROM products_all;
Filtering:
SELECT * FROM products_all WHERE price < 100;
The distributed table provides a unified view of data spread across multiple tables.
View table structure:
SHOW CREATE TABLE products_all;
Add more local tables to existing distributed table:
CREATE TABLE products_asia(title text, price float, region string);
INSERT INTO products_asia(title, price, region) VALUES ('Tablet Pro', 599.99, 'Asia'), ('Earbuds Wireless', 149.99, 'Asia');
Remote Agents
Distributed tables can query remote Manticore servers using agents.
Agent syntax: agent = hostname:port:table_name
Example distributed table with remote agents (config file):
table products_global {
type = distributed
local = products_local
agent = server1:9312:products
agent = server2:9312:products
}
Note: Remote agents must be configured in the Manticore config file - they cannot be created via SQL CREATE TABLE. The SQL method only supports local tables.
Agent settings control connection behavior (config file):
table products_dist {
type = distributed
agent = server1:9312:products
agent_connect_timeout = 1000
agent_query_timeout = 3000
agent_retry_count = 2
}
Key timeout settings:
- agent_connect_timeout: connection establishment (ms)
- agent_query_timeout: query execution timeout (ms)
- agent_retry_count: retry attempts on failure
For this demo, let's create a distributed table combining all regions:
CREATE TABLE dist_demo type='distributed' local='products_us' local='products_eu' local='products_asia';
Query all regions with one statement:
SELECT * FROM dist_demo;
View the distributed table structure:
SHOW CREATE TABLE dist_demo;
High Availability with Agent Mirrors
Agent mirrors provide failover - if one server fails, queries go to the mirror.
Mirror syntax (config file):
table products_ha {
type = distributed
agent = server1:9312|server1_backup:9312:products
ha_strategy = roundrobin
}
The pipe (|) separates primary and mirror agents.
HA strategies:
- random: randomly select from mirrors
- roundrobin: cycle through mirrors
- nodeads: skip known dead mirrors
- noerrors: skip mirrors with recent errors
Example with multiple mirrors:
table products_failover {
type = distributed
agent = srv1:9312|srv2:9312|srv3:9312:products
ha_strategy = noerrors
agent_retry_count = 3
}
Persistent connections improve performance:
table products_persistent {
type = distributed
agent = server1:9312:products
agent_persistent = 1
}
Check agent status:
SHOW AGENT STATUS;
This shows connection state and error counts for remote agents.
Let's verify the distributed table works correctly by querying data from all regions:
SELECT region, COUNT(*) as cnt FROM dist_demo GROUP BY region;
This query demonstrates that distributed tables support aggregations (GROUP BY, COUNT) across multiple local tables, returning combined results from all underlying tables.
Practical Patterns and Best Practices
Let's explore common distributed table patterns.
Sharding by region (data partitioning):
SELECT * FROM dist_demo WHERE region = 'US';
Each query only hits relevant shards if you filter by shard key.
Load balancing read queries:
Create multiple replicas and distribute reads:
table products_lb {
type = distributed
agent = replica1:9312|replica2:9312|replica3:9312:products
ha_strategy = roundrobin
}
Time-based sharding (logs example):
table logs_all {
type = distributed
local = logs_2024_01
local = logs_2024_02
local = logs_2024_03
}
Check distributed table configuration:
SHOW CREATE TABLE dist_demo;
SHOW TABLES;
Monitor query distribution:
SHOW STATUS LIKE 'dist%';
Best practices:
- Use local tables when possible (faster)
- Set appropriate timeouts for remote agents
- Use mirrors for high availability
- Monitor agent status regularly
- Consider data locality when sharding
- Use persistent connections for frequent queries
Clean up demo tables:
DROP TABLE dist_demo;
DROP TABLE products_all;