>>
Auto schema mechanism
Introduction
The auto schema mechanism in Manticore Search automatically creates tables when you insert data into a table that doesn't yet exist. Instead of manually defining a schema with CREATE TABLE, Manticore infers the column types from the values you provide.
This feature requires Manticore Buddy and is enabled by default. To disable it, add auto_schema = 0 to the Searchd section of your Manticore configuration file. To re-enable it, set auto_schema = 1 or remove the setting entirely.
You can use auto schema with both the SQL and HTTP protocols.
Basic usage
Let's connect to Manticore via the MySQL client:
mysql -P9306 -h0
Now, let's insert data into a table that doesn't exist yet:
INSERT INTO products(title, price, description) VALUES('Laptop', 999.99, 'A powerful laptop for professionals');
The table was created automatically. Let's verify its schema:
DESCRIBE products;
As you can see, Manticore inferred the types:
- title and description → text (indexed and stored)
- price → float
Let's check the data:
SELECT * FROM products;
Now let's insert more rows into the same table:
INSERT INTO products(title, price, description) VALUES('Smartphone', 599.99, 'Latest model with great camera');
INSERT INTO products(title, price, description) VALUES('Tablet', 349.50, 'Lightweight and portable');
SELECT * FROM products;
Let's try another example. We'll insert a row with different data types into a new table:
INSERT INTO users(name, age, email, active) VALUES('Alice', 30, 'alice@example.com', 1);
DESCRIBE users;
Here Manticore inferred:
- name → text
- age → uint (unsigned integer)
- email → string (email addresses are detected as strings, not text)
- active → uint
SELECT * FROM users;
Data type inference
The auto schema mechanism detects value types automatically. Let's explore how different data types are inferred:
DROP TABLE IF EXISTS t; INSERT INTO t(i, f, t, s, j, b, m, mb) VALUES(123, 1.2, 'text here', 'test@mail.com', '{"a": 123}', 1099511627776, (1,2), (1099511627776,1099511627777));
DESCRIBE t;
The types were inferred as follows:
- 123 → uint (unsigned integer)
- 1.2 → float
- 'text here' → text (full-text indexed and stored)
- 'test@mail.com' → string (emails are detected as strings, not text)
- '{"a": 123}' → json
- 1099511627776 → bigint (value exceeds 32-bit range)
- (1,2) → mva (multi-value attribute)
- (1099511627776,1099511627777) → mva64 (64-bit multi-value attribute)
Let's verify the data was inserted correctly:
SELECT * FROM t;
Email detection
By default, all text values in the VALUES clause are considered to be of the text type:
DROP TABLE IF EXISTS t; INSERT INTO t(f) VALUES('regular text'); DESCRIBE t;
The exception is values that represent valid email addresses, which are treated as the string type:
DROP TABLE IF EXISTS t; INSERT INTO t(f) VALUES('user@example.com'); DESCRIBE t;
Notice that 'regular text' was detected as text, while 'user@example.com' was detected as string.
Auto data type conversion
When performing batch inserts, Manticore automatically promotes types to accommodate all values in a field. The following compatible conversions are applied:
- mva → mva64
- uint → bigint → float
- string → text
Compatible type promotion
DROP TABLE IF EXISTS t; INSERT INTO t(f, b) VALUES(123, 1099511627776), (1.2, 10); DESCRIBE t;
Here f was promoted to float (to accommodate both 123 and 1.2), and b was promoted to bigint (to accommodate both 1099511627776 and 10).
DROP TABLE IF EXISTS t; INSERT INTO t(mb) VALUES((1,2)),((1099511627776,1099511627777)); DESCRIBE t;
The mb column was promoted from mva to mva64 to accommodate large values.
DROP TABLE IF EXISTS t; INSERT INTO t(t) VALUES('test@mail.com'),('regular text'); DESCRIBE t;
The t column was promoted from string (email) to text to accommodate both values.
Incompatible types and HTTP
Incompatible types
If values in the same field have incompatible types, Manticore will return an error:
DROP TABLE IF EXISTS t; INSERT INTO t(f) VALUES(123),('text');
The values 123 (integer) and 'text' (text) are incompatible and cannot be reconciled into a single column type. This is the expected behavior.
Now let's exit the MySQL client and explore auto schema via the HTTP protocol:
quit
Auto schema via HTTP
The auto schema mechanism also works with HTTP JSON requests. Let's insert a document into a non-existent table:
curl -s localhost:9308/insert -d '
{"index":"http_test","id":1,
"doc":{"title":"Smartphone with great camera",
"price":599.99,
"specs":{"ram":8,"storage":256}}}
' | jq
curl -s localhost:9308/cli -d "DESC http_test"
The schema was automatically inferred from the JSON document:
- title → text
- price → float
- specs → json
Timestamp detection via HTTP
When using the HTTP JSON protocol, Manticore can automatically detect ISO 8601 date formats and store them as timestamp:
curl -s localhost:9308/insert -d '
{"index":"events","id":1,
"doc":{"name":"signup",
"created_at":"2024-01-15T10:30:00"}}
' | jq
curl -s localhost:9308/cli -d "DESC events"
The created_at field is stored as timestamp. Note that this automatic timestamp detection works only via the HTTP JSON protocol and supports ISO 8601 formats:
%Y-%m-%dT%H:%M:%E*S%Z (e.g. 2024-01-15T10:30:00.123Z) %Y-%m-%dT%H:%M:%S%Z (e.g. 2024-01-15T10:30:00Z) %Y-%m-%dT%H:%M:%E*S (e.g. 2024-01-15T10:30:00.123) %Y-%m-%dT%H:%M:%s (e.g. 2024-01-15T10:30:00) %Y-%m-%dT%H:%M (e.g. 2024-01-15T10:30) %Y-%m-%dT%H (e.g. 2024-01-15T10)
Elasticsearch-like endpoints
You can also use the Elasticsearch-like _create endpoint:
curl -s localhost:9308/cli -d "DROP TABLE IF EXISTS products2"
curl -s localhost:9308/products2/_create/1 -d ' { "title": "Red Bag", "price": 19.85 } ' | jq
curl -s localhost:9308/cli -d "DESC products2"
The Elasticsearch-like /_bulk endpoint supports auto schema as well:
curl -s localhost:9308/cli -d "DROP TABLE IF EXISTS products3"
curl -s localhost:9308/_bulk -H 'Content-Type: application/x-ndjson' -d '
{ "index" : { "_index" : "products3" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "index" : { "_index" : "products3" } }
{ "title" : "Blue Bag", "price": 15.5 }
' | jq
curl -s localhost:9308/cli -d "SELECT * FROM products3"
Limitations
Keep in mind the following limitations of the auto schema mechanism:
-
float_vector not supported: The float_vector type cannot be auto-inferred. You must explicitly create a table with float_vector columns for KNN search.
-
/bulk endpoint not supported: The Manticore-native /bulk HTTP endpoint does not support auto schema. Use the Elasticsearch-like /_bulk endpoint or SQL instead.
-
Precision loss: Type promotion from uint to bigint to float may cause precision loss for large numbers.
-
Timestamp detection: Automatic timestamp detection is only available via the HTTP JSON protocol. Via SQL, date strings are stored as text.
-
Buddy required: The auto schema mechanism requires Manticore Buddy to be running.