>>
Regular Expressions in Searches
Setting Up for Regex Search
Let's connect to Manticore and create a table with regex support.
mysql -P9306 -h0
The REGEX operator inside MATCH requires min_infix_len or min_prefix_len and dict=keywords (which is the default).
Create a table with infix support:
CREATE TABLE products(title text, sku string, description text) min_infix_len='2' dict='keywords';
Insert sample data with various patterns:
INSERT INTO products(title, sku, description) VALUES
('iPhone 15 Pro Max', 'APPL-IP15PM-256', 'Latest Apple smartphone with A17 chip'),
('Samsung Galaxy S24', 'SAMS-GS24-128', 'Android flagship with AI features'),
('MacBook Pro 16', 'APPL-MBP16-512', 'Professional laptop with M3 chip'),
('Dell XPS 15', 'DELL-XPS15-256', 'Premium Windows ultrabook'),
('iPad Air 5', 'APPL-IPA5-64', 'Versatile tablet with M1 chip'),
('Sony WH-1000XM5', 'SONY-WH1KXM5', 'Noise cancelling headphones'),
('Pixel 8 Pro', 'GOOG-PX8P-256', 'Google phone with AI camera'),
('Surface Pro 9', 'MSFT-SP9-256', 'Windows tablet and laptop hybrid'),
('GPU Graphics Card', 'NVDA-RTX4090', 'High performance graphics card with AI chips and ray tracing');
Verify data:
SELECT * FROM products;
Basic REGEX Operator
The REGEX operator uses RE2 syntax for pattern matching in full-text fields.
Search for words ending with specific pattern:
SELECT * FROM products WHERE MATCH('REGEX(/.*phone/)');
This finds documents containing tokens ending with "phone": "iphone" from the title, "smartphone" from the description, and "phone" itself.
Search for words starting with a pattern:
SELECT * FROM products WHERE MATCH('REGEX(/pro.*/)');
This finds documents with tokens starting with "pro", such as "pro" and "professional".
Search for words containing numbers:
SELECT * FROM products WHERE MATCH('REGEX(/.*\\d+.*/)');
Note: patterns match whole tokens, so use .* for partial matching.
Match exact word length:
SELECT * FROM products WHERE MATCH('REGEX(/^.{5}$/)');
Finds 5-character tokens.
REGEX in MATCH works on lowercased tokens (Manticore lowercases all text during indexing). Always use lowercase in your patterns:
SELECT * FROM products WHERE MATCH('REGEX(/apple/)');
This finds "Apple" because it's stored as "apple" in the index. Uppercase won't match:
SELECT * FROM products WHERE MATCH('REGEX(/Apple/)');
No results — the index only has "apple", not "Apple".
Advanced Regex Patterns
Let's explore more complex patterns.
Match words with specific character classes:
SELECT * FROM products WHERE MATCH('REGEX(/[a-z]+[0-9]+/)');
Finds tokens with letters followed by numbers (like m3, a17).
Alternation - match one of several patterns:
SELECT * FROM products WHERE MATCH('REGEX(/(laptop|tablet|phone)/)');
Quantifiers for flexible matching — ? means "zero or one":
SELECT * FROM products WHERE MATCH('REGEX(/chips?/)');
This matches both "chip" (s is optional) and "chips". The ? quantifier lets you match singular and plural forms with one pattern.
Match words with repeated characters:
SELECT * FROM products WHERE MATCH('REGEX(/.*oo.*/)');
Finds words containing "oo" (like "google" and "macbook").
Combine with regular search terms:
SELECT * FROM products WHERE MATCH('apple REGEX(/.*[0-9]+/)');
This finds documents with "apple" AND tokens containing numbers.
Negative lookahead isn't supported in RE2, but you can use character classes:
SELECT * FROM products WHERE MATCH('REGEX(/[^0-9]+/)');
Matches tokens without digits.
REGEX() Function for String Attributes
The REGEX() function applies regex matching on string and JSON attributes, outside of full-text search.
The syntax is:
REGEX(attribute, 'pattern')
It returns 1 if the pattern matches, 0 otherwise. In a WHERE clause it works as a boolean filter directly. It uses RE2 syntax just like the REGEX operator in MATCH.
Filter by SKU pattern - find all Apple products:
SELECT * FROM products WHERE REGEX(sku, '^APPL');
This finds all products whose SKU starts with "APPL".
Match SKUs ending with a specific storage size:
SELECT * FROM products WHERE REGEX(sku, '-256$');
Finds products with 256GB storage (SKU ending in -256).
Combine REGEX() with full-text search:
SELECT * FROM products WHERE MATCH('laptop') AND REGEX(sku, '^DELL');
Multiple vendor prefixes using alternation:
SELECT * FROM products WHERE REGEX(sku, '^(APPL|GOOG|MSFT)');
Finds products from Apple, Google, or Microsoft.
Match SKUs ending with exactly 3 digits:
SELECT sku, title FROM products WHERE REGEX(sku, '[0-9]{3}$');
For case-insensitive matching on attributes, use the (?i) flag:
SELECT * FROM products WHERE REGEX(sku, '(?i)^appl');
Note: unlike the REGEX operator in MATCH (which works on indexed tokens lowercased by charset_table), REGEX() function matches against the original attribute value, so case matters unless you use (?i).
REGEX vs Wildcards
REGEX scans the entire dictionary to find matching tokens, while wildcards use the infix index directly. Let's compare them using SHOW META.
REGEX search:
SELECT * FROM products WHERE MATCH('REGEX(/.*phone/)');
SHOW META;
Wildcard search:
SELECT * FROM products WHERE MATCH('*phone*');
SHOW META;
Compare the keyword[0] field — REGEX shows the raw pattern while wildcard shows the expanded term. On large datasets, wildcard is significantly faster because it leverages the infix index instead of scanning all dictionary entries.
Also note: REGEX /.*phone/ matches tokens ending with "phone" (like "smartphone", "iphone", "phone"), while wildcard *phone* also matches "headphones" since it finds "phone" as a substring anywhere in the token.
Best practices:
-
Anchor patterns when possible:
SELECT * FROM products WHERE MATCH('REGEX(/^pro.*/)'); -
Use specific character classes instead of .*:
SELECT * FROM products WHERE MATCH('REGEX(/[a-z]{3,6}[0-9]+/)'); -
Combine with other terms to reduce scan scope:
SELECT * FROM products WHERE MATCH('apple REGEX(/.*[0-9]+/)'); -
For simple prefix/suffix matching, prefer wildcard syntax:
SELECT * FROM products WHERE MATCH('pro*');
SELECT * FROM products WHERE MATCH('*phone');
Use REGEX when wildcards can't express your pattern (alternation, quantifiers, character classes).