>>
Manticore Search - Main+delta schema
Initial setup
For this course we are going to use a MySQL source (https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_databases/Introduction)
Beside the table holding documents we are going to index we are also going to use a helper table which will keep track of the moments of the main and delta indexing times.
The delta table must capture any changes in the documents table:
- new documents
- modified documents
- deleted documents
For the first two it's enough to have a timestamp column in the documents table. For properly syncing the deleted documents from the table to the table we'll use an additional column, telling us whether the document is marked as deleted. It means the application must not delete a document, but should mark it as deleted (otherwise it will be impossible to figure out that the doc is not available any more and it will remain in the table until it's fully rebuilt). If it's a problem in your case - consider real time tables instead, they're much more suited to cases when the data changes intensively. Also the documents we want to make searchable will be filtered by the "deleted" flag (we want only those that are not marked as deleted).
Here's an example of the main table from which we want to build a Manticore Search table:
CREATE TABLE documents ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, title VARCHAR(50) NOT NULL, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, deleted TINYINT UNSIGNED NOT NULL, PRIMARY KEY (id) );Note the updated_at and deleted columns that are important in our case.
The additional helper table will have 2 columns: one for a table name, another for a timestamp:
CREATE TABLE deltabreaker ( index_name VARCHAR(50) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (index_name) );
Preparing the Main Table
What's next? We could just run the following query in the source config:
SELECT * FROM documentsHowever, this is not good as it misses some important things:
- we don't want deleted documents in the main table, so we want to add "WHERE deleted=0"
- we want to make sure we know the time until which we can say that all the documents before that time are for sure in the table. To do that even before we fetch anything from the table, we can save the current time in the helper table (note sql_query_pre before) and then use that time in the main query (see sql_query)
- it's possible without sql_query_range (which is primarily used for splitting too big query into multiple smaller ones, read the doc https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_databases/Ranged_queries#sql_query_range for details), but sql_query_range mostly doesn't hurt and makes things easier
So what we have in the end is:
    sql_query_pre = REPLACE INTO deltabreaker SET index_name = 'main', created_at = NOW()
    sql_query =  SELECT id, title, UNIX_TIMESTAMP(updated_at) AS updated FROM documents WHERE deleted=0 AND  updated_at  >=FROM_UNIXTIME($start) AND updated_at  <= FROM_UNIXTIME($end)
    sql_query_range  = SELECT ( SELECT UNIX_TIMESTAMP(MIN(updated_at)) FROM documents) min, ( SELECT UNIX_TIMESTAMP(created_at)-1 FROM deltabreaker WHERE index_name='main') max
    sql_query_post_index = REPLACE INTO deltabreaker set index_name = 'delta', created_at = (SELECT created_at FROM deltabreaker t WHERE index_name='main')
The pre query will mark the time of indexing. All documents having at most the timestamp of current time will be indexed by the Main table. In the post table query we're changing the timestamp of the delta table.
Let's build the Main table. To populate it, we use the indexer tool. When using indexer, the generated table files must be made with permissions that allow searchd to access them. In case of Linux official packages searchd runs under manticore user, so indexer must also run under manticore user:
sudo -u manticore indexer main
If you are running 'searchd' differently, you might be able to omit the sudo -u manticore part, just make sure that the user under which your searchd instance is running has read/write permissions to the tables you generate.
Enter Manticore:
mysql -P9306 -h0
Get the new table online:
RELOAD TABLES;
And check the main table:
SELECT * FROM main;
We see that we have 9 documents.
Making changes in the source
First, let's exit Manticore and go to MySQL
exit;
mysql test -u user --password=pass123
And check the documents table:
SELECT * FROM documents;
We see we have 9, same as in the Manticore table.
Let's perform some changes: we are going to add a new row, modify an existing one and mark one as deleted:
INSERT INTO documents VALUES(10,'Document Ten',NOW(),0);
UPDATE documents SET title='Altered document Six' WHERE id=6;
UPDATE documents SET deleted=1 WHERE id=5;exit;
Creating the Delta table
The source of the delta will inherit the Main source (Note "source delta : main") with a few overrides:
source delta : main
{
    sql_query_pre =
    sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(created_at) FROM deltabreaker WHERE index_name='delta') min, UNIX_TIMESTAMP() max
    sql_query_killlist = SELECT id FROM documents WHERE updated_at >=  (SELECT created_at FROM deltabreaker WHERE index_name='delta')
}
The 'sql_query_pre=' means to suppress the inherited pre query from main, as we don't want to perform any update to the helper table (as we already prepared all for the delta table in sql_query_post_index in the Main source, see step 2). We see the range will get rows with the timestamp between the marked time for the delta and current time. You may ask yourself why 2 entries are used in the helper as right now they will hold the same timestamp. The reason is that for the Main table we set a timestamp before it starts indexing. In case the Main indexation failed the next run of the Delta would use the new timestamp as a starting point, missing changes between last good indexing and the failed one. That's why we have a separate entry for the delta table and populate it after the Main table is successfully built (remember sql_query_post_index - https://manual.manticoresearch.com/Adding_data_from_external_storages/Fetching_from_databases/Execution_of_fetch_queries#sql_query_post_index).
The delta table configuration also needs to include the killlist_target, which will tell to use the killlist to suppress affected ids in the Main table:
table delta
{
    path = /var/lib/manticore/delta
    source = delta
    killlist_target = main:kl
}
Read more about killlist_target - https://manual.manticoresearch.com/Creating_an_index/Local_indexes/Plain_and_real-time_index_settings#killlist_target
Let's now index the Delta. To populate the table we use the indexer tool. When using indexer, the generated table files must be made with permissions that allow searchd(Manticore Search server) to access them. In case of Linux official packages searchd runs under manticore user, so indexer must also run under manticore user:
sudo -u manticore indexer delta
If you are running searchd differently you might be able to omit the sudo -u manticore part, just make sure that the user under which your searchd instance is running has read/write permissions to the tables you generate.
Enter Manticore, reload the tables:
mysql -P9306 -h0
RELOAD TABLES;
All together
Let's look at the Delta table:
SELECT * FROM delta;
The Delta includes the new and updated documents.
And the Main:
SELECT * FROM main;
We see in the Main table both the changed and deleted documents are missing.
To search on both table we can use a local distributed table over the both:
table mytable
{
    type = distributed
    local = main
    local = delta
}
SELECT * FROM mytable;
Searching on the both give us the new and updated documents as well as not showing the deleted document (due to the sql_query_killlist and killlist_target, see previous step).
