Terminal Terminal | Web Web
Home  //  Play

Manticore replication failover

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - replication fail over scenario

In this course we follow a simple scenario in which a node goes down in a cluster and how we can add the node back to the cluster.

Manticore replication failover

Step 1 of 4

Fire up a cluster

In this course we will use 3 Manticore instances for replicating an RT index.

Let's connect to one of the instance:

mysql -P9306 -h0

create a test index:

CREATE TABLE testrt (title text, content text, gid uint);

and create a new cluster in it:

CREATE CLUSTER posts;

Let's now add an RT index to the cluster:

ALTER CLUSTER posts ADD testrt;exit;

Now let's connect on the second instance:

mysql -P9307 -h0

And connect to the cluster:

join cluster posts '127.0.0.1:9312' as nodes;exit;

And let's do the same for the third instance:

mysql -P9308 -h0

join cluster posts '127.0.0.1:9312' as nodes;

Let's now insert some data into our testrt index.

INSERT INTO posts:testrt VALUES(1,'List of HP business laptops','Elitebook Probook',10);

INSERT INTO posts:testrt VALUES(2,'List of Dell business laptops','Latitude Precision Vostro',10);

INSERT INTO posts:testrt VALUES(3,'List of Dell gaming laptops','Inspirion Alienware',20);exit;

Crashing an instance

To crash an instance we can simply send a KILL signal to it. We'll do it with the first instance, but first let's see what we have there in the testrt index:

mysql -P 9306 -h0

SELECT * FROM testrt;exit;

Let's now crash the instance:

pkill -F /var/run/manticore/searchd1.pid

While the first instance is down, let's add more documents to the cluster:

mysql -P 9307 -h0

INSERT INTO posts:testrt VALUES(4,'Lenovo laptops list','Yoga IdeaPad',30);

INSERT INTO posts:testrt VALUES(5,'List of ASUS ultrabooks and laptops','Zenbook Vivobook',30);

INSERT INTO posts:testrt VALUES(6,'List of Acer gaming laptops','Predator Helios Nitro',45);

SELECT * FROM testrt;exit;

Rejoining the cluster

In the previous step we dirtily stopped one of the instances and after that added more data to the RT index. When the crashed instance is started again it's expected to reconnect to the cluster and sync up with the existing nodes.

Start the instance:

/usr/bin/searchd --config /etc/manticoresearch/manticore1.conf

mysql -P 9306 -h0

Let's look at the RT index:

SELECT * FROM testrt;exit;

We see the instance has received the new documents that were added while it was down.

Updating cluster active nodes list

When a node joins a cluster it receives from the node it connected to a list of nodes used for replication. Another list of nodes contains the nodes it should try to connect to in case of a reconnect.

If we go to the second instance:

mysql -P 9307 -h0

We can see the lists with SHOW STATUS command:

SHOW STATUS LIKE '%posts_node%';

The '_set' list is the active nodes that will be used to reconnect. As we can see only the first node is in this list. It means if the second and the first nodes go down and the second node goes up, it won't be able to join the cluster because it doesn't have the third node in the list of active nodes of the cluster.

To ensure a node can reconnect to the cluster no matter if another node is also down, we should have all the nodes aware about the active nodes of the cluster.

To do that we only need to run the below ALTER command on one of the nodes:

ALTER CLUSTER posts UPDATE nodes;exit;

Let's check the third node:

mysql -P 9308 -h0

SHOW STATUS LIKE '%posts_node%';

As it can be seen, the active node list contains all the three nodes of the cluster.