Home  //  Play

Manticore replication failover

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - replication fail over scenario

In this course we follow a simple scenario in which a node goes down in a cluster and how we can add the node back to the cluster.

Manticore replication failover

Step 1 of 4

Fire up the cluster

In this course we will use 3 Manticore instance that replicate a PQ index.

Let's connect of one of the instance:

mysql -P9306 -h0

and create a new cluster at it:


We add the pq index to the cluster:

ALTER CLUSTER posts ADD pq;exit;

Now let's connect on the second instance :

mysql -P9307 -h0

And connect to the cluster:

join cluster posts '' as nodes;exit;

And we do the same for the third instance:

mysql -P9308 -h0

join cluster posts '' as nodes;

Let's insert some data into our pq index.

INSERT INTO posts:pq VALUES('value 1');

INSERT INTO posts:pq VALUES('value 2');

INSERT INTO posts:pq VALUES('value 3');exit;

Crashing a instance

To crash an instance we can simply send a KILL signal to it. We'll choose the first instance, but first let's see what queries we get from the first instance:

mysql -P 9306 -h0

SELECT * FROM pq;exit;

Let's crash the instance:

pkill -F /var/run/manticore/searchd1.pid

While the first instance is down, we add more queries in the cluster:

mysql -P 9307 -h0

INSERT INTO posts:pq VALUES('value 4');

INSERT INTO posts:pq VALUES('value 5');

INSERT INTO posts:pq VALUES('value 6');

SELECT * FROM pq;exit;

Rejoining the cluster

In the previous step we stopped one of the instances and added new data to the PQ index. When the crashed instance is started again it's expected to reconnect to the cluster and sync with the existing nodes.

Start the instance:

/usr/bin/searchd --config /etc/sphinxsearch/sphinx1.conf

mysql -P 9306 -h0

Let's look at the pq index:

SELECT * FROM pq;exit;

We see the instance received the new queries that were added while it was down.

Updating cluster active nodes list

When a cluster joins a cluster it receives from the node it connected a list of nodes used for replication. Another list of nodes contains the nodes it should try to connect in case of a reconnect.

If we go to the second instance:

mysql -P 9307 -h0

We can see the lists with SHOW STATUS command:

SHOW STATUS LIKE '%posts_node%';

The '_set' list is the active nodes that will be used to reconnect. As we can see only the first node is in this list. This means if this second and first node go down and the second node goes up, it won't be able to join the cluster because it doesn't have the third node in the list of active nodes of the cluster.

To ensure a node can reconnect to the cluster no matter other node is also down, we should have all nodes aware about the active nodes of the cluster.

To do that we only need to run the below ALTER command on one of the nodes:

ALTER CLUSTER posts UPDATE nodes;exit;

Let's check the third node:

mysql -P 9308 -h0

SHOW STATUS LIKE '%posts_node%';

As it can be seen, the active node list contain all three nodes of the cluster.