>>
KNN Vector Search with Manticore
Introduction to Vector Search
Vector search (also known as KNN - K-Nearest Neighbors) allows you to find similar items based on their vector representations (embeddings). This is essential for:
- Semantic search (finding documents by meaning, not just keywords)
- Recommendation systems
- Image similarity search
Manticore Search supports vector search using the float_vector data type with HNSW (Hierarchical Navigable Small World) algorithm.
Creating a Vector Table
First, let's connect to Manticore using the MySQL client:
mysql -P9306 -h0
Now let's create a table with vector support:
CREATE TABLE products (id BIGINT, title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' KNN_DIMS='4' HNSW_SIMILARITY='L2');
Key parameters:
- KNN_TYPE='hnsw' - the indexing algorithm (currently only HNSW is supported)
- KNN_DIMS='4' - vector dimensionality (number of elements in each vector)
- HNSW_SIMILARITY='L2' - distance metric (L2, IP, or COSINE)
Let's verify the table was created:
DESCRIBE products;
Inserting Vector Data
Now let's insert some products with their vector embeddings. In real applications, these vectors would be generated by an embedding model (like sentence-transformers).
For this tutorial, we'll use simple 4-dimensional vectors:
INSERT INTO products VALUES (1, 'red leather bag', (0.653, 0.192, 0.018, 0.340));
INSERT INTO products VALUES (2, 'white canvas bag', (-0.149, 0.748, 0.092, -0.095));
INSERT INTO products VALUES (3, 'blue denim backpack', (0.286, -0.032, 0.067, 0.033));
INSERT INTO products VALUES (4, 'black laptop bag', (0.512, 0.156, -0.201, 0.445));
INSERT INTO products VALUES (5, 'green travel bag', (0.125, 0.890, 0.045, -0.167));
Let's verify our data:
SELECT * FROM products;
Notice that similar items (like bags) might have vectors that are close to each other in the vector space.
Performing KNN Searches
Now let's search for similar products using KNN. We'll find the nearest neighbors to a query vector.
Basic KNN Search
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 3, (0.653, 0.192, 0.018, 0.340));
This query finds the 3 products most similar to the vector (0.653, 0.192, 0.018, 0.340).
The knn_dist() function returns the distance between each result and the query vector. Lower distance means higher similarity.
Finding Similar Documents by ID
You can also find documents similar to an existing document by its ID:
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 3, 1);
This finds the 3 products most similar to product with id=1 (red leather bag).
Changing the Number of Results
To get more or fewer results, change the second parameter in knn():
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 5, (0.5, 0.2, 0.1, 0.3));
This returns the 5 nearest neighbors instead of 3.
Combining KNN with Filters
One of the powerful features of Manticore is the ability to combine vector search with traditional filters and full-text search.
KNN + Full-Text Search
Find similar products that also contain the word "bag":
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 5, (0.5, 0.2, 0.1, 0.3)) AND MATCH('bag');
KNN + Attribute Filter
Find similar products with id less than 4:
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 5, (0.5, 0.2, 0.1, 0.3)) AND id < 4;
Combining Multiple Filters
You can combine KNN with both full-text search and attribute filters:
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 5, (0.5, 0.2, 0.1, 0.3)) AND MATCH('bag') AND id > 1;
This hybrid approach is very powerful for building recommendation systems where you want semantic similarity but also need to apply business rules (filters).
Understanding Distance Metrics
Manticore supports different distance metrics. Let's see how L2 and COSINE differ in practice.
Creating Tables with Different Metrics
Create a table with COSINE similarity:
CREATE TABLE products_cosine (id BIGINT, title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' KNN_DIMS='4' HNSW_SIMILARITY='COSINE');
Insert test data - vectors pointing in the same direction but with different lengths:
INSERT INTO products_cosine VALUES (1, 'short vector', (0.1, 0.1, 0.1, 0.1)), (2, 'long vector', (0.9, 0.9, 0.9, 0.9)), (3, 'different direction', (0.9, -0.1, 0.2, -0.3));
Insert the same data into our L2 table:
INSERT INTO products VALUES (6, 'short vector', (0.1, 0.1, 0.1, 0.1)), (7, 'long vector', (0.9, 0.9, 0.9, 0.9)), (8, 'different direction', (0.9, -0.1, 0.2, -0.3));
Comparing L2 vs COSINE
Search with L2 (Euclidean distance):
SELECT id, title, knn_dist() FROM products WHERE knn(embedding, 3, (0.5, 0.5, 0.5, 0.5)) AND id > 5;
Search with COSINE:
SELECT id, title, knn_dist() FROM products_cosine WHERE knn(embedding, 3, (0.5, 0.5, 0.5, 0.5));
Notice the difference:
- L2: "short vector" and "long vector" have similar distances (~0.64) because L2 measures actual distance in space
- COSINE: Both vectors have distance 0 because they point in the SAME direction as the query. COSINE ignores magnitude, only considers angle
When to Use Which
- L2: When vector magnitude matters (e.g., intensity, quantity)
- COSINE: For text embeddings where direction represents meaning (most NLP applications)
- IP: For normalized vectors in recommendation systems