>>
Manticore Columnar Library for Analytics
Row-wise vs Columnar Storage
Let's connect to Manticore and understand the difference between storage types.
mysql -P9306 -h0
Traditional row-wise storage keeps all attributes of a record together. Columnar storage organizes data by column, which is efficient for analytics.
Create a row-wise table (default):
CREATE TABLE sales_rowwise(product text, region string, amount float, quantity int, sale_date timestamp);
Create a columnar table:
CREATE TABLE sales_columnar(product text, region string, amount float, quantity int, sale_date timestamp) engine='columnar';
Insert identical data into both:
INSERT INTO sales_rowwise(product, region, amount, quantity, sale_date) VALUES
('Laptop', 'North', 1299.99, 5, 1704067200),
('Phone', 'South', 899.99, 10, 1704153600),
('Tablet', 'East', 599.99, 8, 1704240000),
('Monitor', 'West', 349.99, 15, 1704326400),
('Keyboard', 'North', 79.99, 50, 1704412800);
INSERT INTO sales_columnar(product, region, amount, quantity, sale_date) VALUES
('Laptop', 'North', 1299.99, 5, 1704067200),
('Phone', 'South', 899.99, 10, 1704153600),
('Tablet', 'East', 599.99, 8, 1704240000),
('Monitor', 'West', 349.99, 15, 1704326400),
('Keyboard', 'North', 79.99, 50, 1704412800);
View table settings to confirm storage type:
SHOW TABLE sales_columnar SETTINGS;
Selective Columnar Attributes
You can mix row-wise and columnar storage in one table by specifying engine='columnar' for individual attributes.
Create a table with only specific columns in columnar storage:
CREATE TABLE orders(title text, customer string, total float engine='columnar', items int engine='columnar', status string);
Here, total and items use columnar storage (good for aggregations), while customer and status use the default row-wise storage.
Insert sample data:
INSERT INTO orders(title, customer, total, items, status) VALUES
('Order 001', 'John Smith', 150.50, 3, 'completed'),
('Order 002', 'Jane Doe', 299.99, 5, 'pending'),
('Order 003', 'Bob Wilson', 75.00, 2, 'completed'),
('Order 004', 'Alice Brown', 450.00, 8, 'shipped'),
('Order 005', 'Charlie Davis', 125.75, 4, 'completed');
Verify the configuration by describing the table:
DESCRIBE orders;
You can also do the reverse: set the whole table as columnar and keep specific attributes row-wise:
CREATE TABLE logs(message text, level string engine='rowwise', code int, response_time float) engine='columnar';
Here all attributes are stored in columnar format by default, except level which remains row-wise.
Analytical Queries on Columnar Data
Columnar storage excels at aggregation queries that scan many rows.
Add more data to the columnar table:
INSERT INTO sales_columnar(product, region, amount, quantity, sale_date) VALUES
('Mouse', 'North', 29.99, 100, 1704499200),
('Webcam', 'South', 89.99, 25, 1704585600),
('Headset', 'East', 149.99, 30, 1704672000),
('SSD', 'West', 199.99, 40, 1704758400),
('RAM', 'North', 79.99, 60, 1704844800),
('Cable', 'South', 9.99, 200, 1704931200),
('Stand', 'East', 49.99, 45, 1705017600),
('Hub', 'West', 39.99, 80, 1705104000);
Run aggregation queries:
Total sales by region:
SELECT region, SUM(amount) as total_sales, SUM(quantity) as total_qty FROM sales_columnar GROUP BY region;
Average order value:
SELECT AVG(amount) as avg_sale, MIN(amount) as min_sale, MAX(amount) as max_sale FROM sales_columnar;
Count by region:
SELECT region, COUNT(*) as orders FROM sales_columnar GROUP BY region ORDER BY orders DESC;
These queries benefit from columnar storage because they scan specific columns across many rows.
SHOW META;
Check query statistics after running aggregations.
Performance Comparison
Columnar storage uses less memory than row-wise for large datasets because data is stored in compressed blocks of 65536 entries.
Check table status to see storage details:
SHOW TABLE sales_rowwise STATUS;
SHOW TABLE sales_columnar STATUS;
Look at disk_bytes and ram_bytes in the output. With only a few rows the values may vary, but with large datasets columnar storage typically uses less RAM because data is stored in compressed blocks and read from disk on demand.
Let's run the same aggregation query on both tables and check execution details.
On the row-wise table:
SELECT region, SUM(amount) as total_sales FROM sales_rowwise GROUP BY region;
SHOW META;
On the columnar table:
SELECT region, SUM(amount) as total_sales FROM sales_columnar GROUP BY region;
SHOW META;
Notice both SHOW META outputs are identical — with only a few rows the execution time is the same. With millions of rows, columnar storage provides better performance for aggregation queries because it reads only the columns needed rather than entire rows.
Note that JSON attributes are not supported by columnar storage. If your table needs JSON fields, they will automatically use row-wise storage regardless of the engine setting:
CREATE TABLE products(title text, price float, metadata json) engine='columnar';
DESCRIBE products;
In the Properties column, price shows columnar while metadata has no such property — it stays row-wise since JSON is not supported by the columnar engine.
When to Use Columnar Storage
Let's summarize with practical guidelines.
Columnar storage is best for:
- Large datasets with many rows
- Analytical queries (SUM, AVG, COUNT, GROUP BY)
- Queries that access few columns but many rows
- Memory-constrained environments
Row-wise storage is better for:
- Fetching complete records frequently
- Small to medium datasets
- OLTP-style workloads
Create a reporting table optimized for analytics:
CREATE TABLE metrics(name text, server string, cpu float, mem float, disk float, ts bigint) engine='columnar';
Insert metric data:
INSERT INTO metrics(name, server, cpu, mem, disk, ts) VALUES
('web-1', 'prod-web-01', 45.5, 62.3, 55.0, 1704067200),
('web-2', 'prod-web-02', 78.2, 81.5, 60.0, 1704067200),
('db-1', 'prod-db-01', 32.1, 45.8, 72.0, 1704067200),
('cache-1', 'prod-cache-01', 15.3, 92.1, 25.0, 1704067200),
('web-1', 'prod-web-01', 52.8, 65.0, 55.5, 1704153600),
('web-2', 'prod-web-02', 81.0, 83.2, 61.0, 1704153600);
Run typical analytics queries:
SELECT server, AVG(cpu) as avg_cpu, AVG(mem) as avg_mem FROM metrics GROUP BY server;
SELECT server, MAX(cpu) as peak_cpu FROM metrics GROUP BY server HAVING peak_cpu > 50;
SELECT COUNT(*) as samples, AVG(cpu) as avg_cpu FROM metrics WHERE ts >= 1704067200;