Terminal Terminal | Web Web
Home  //  Play

Manticore replication failover

Difficulty: Beginner
Estimated Time: 10 minutes

Manticoresearch - replication fail over scenario

In this course we follow two simple scenarios:

  • when a node goes down in a cluster and we add it back to the cluster
  • when all nodes goes down/are stopped and we reboot the cluster

Manticore replication failover

Step 1 of 5

Fire up a cluster

In this course we will use 3 Manticore instances for replicating an RT table.

Let's connect to one of the instances:

mysql -P9306 -h0

create a test table:

CREATE TABLE testrt (title text, content text, gid uint);

and create a new cluster in it:

CREATE CLUSTER posts;

Let's now add an RT table to the cluster:

ALTER CLUSTER posts ADD testrt;exit;

Now let's connect to the second instance:

mysql -P9307 -h0

And connect to the cluster:

join cluster posts '127.0.0.1:9312' as nodes;exit;

And let's do the same for the third instance:

mysql -P9308 -h0

join cluster posts '127.0.0.1:9312' as nodes;

Let's now insert some data into our testrt table.

INSERT INTO posts:testrt VALUES(1,'List of HP business laptops','Elitebook Probook',10);

INSERT INTO posts:testrt VALUES(2,'List of Dell business laptops','Latitude Precision Vostro',10);

INSERT INTO posts:testrt VALUES(3,'List of Dell gaming laptops','Inspirion Alienware',20);exit;

Updating cluster active nodes list

When a node joins a cluster it receives a list of nodes used for replication from the node it is connected to . Another list of nodes contains the nodes it should try to connect to in case of a reconnect.

If we go to the second instance:

mysql -P 9307 -h0

we can see its node lists with a 'SHOW STATUS' command:

SHOW STATUS LIKE '%posts_node%';

The '_set' list contains the active nodes that will be used for reconnecting. As we can see, only the first node is in this list. It means if the second and the first nodes go down (with only the third node alive) and the second node goes up, it won't be able to join the cluster because it doesn't have the third node in the list of active nodes of the cluster.

To ensure a node can reconnect to the cluster no matter if another node is also down or not, we should make all the nodes aware about the active nodes of the cluster.

To do that, we only need to run the 'ALTER' command on one of the nodes, as follows:

ALTER CLUSTER posts UPDATE nodes;exit;

Crashing an instance

To crash an instance, we can simply send a KILL signal to it. We'll do it with the first instance, but first let's see what we have there in the 'testrt' table:

mysql -P 9306 -h0

SELECT * FROM testrt;exit;

Let's now crash the instance:

pkill -F /var/run/manticore/searchd1.pid

While the first instance is down, let's add more documents to the cluster:

mysql -P 9307 -h0

INSERT INTO posts:testrt VALUES(4,'Lenovo laptops list','Yoga IdeaPad',30);

INSERT INTO posts:testrt VALUES(5,'List of ASUS ultrabooks and laptops','Zenbook Vivobook',30);

INSERT INTO posts:testrt VALUES(6,'List of Acer gaming laptops','Predator Helios Nitro',45);

SELECT * FROM testrt;exit;

Rejoining the cluster

In the previous step we dirtily stopped one of the instances and after that added more data to the RT table. When the crashed instance is started again it's expected to reconnect to the cluster and to sync up with the existing nodes.

Let's start the instance:

/usr/bin/searchd --config /etc/manticoresearch/manticore1.conf

mysql -P 9306 -h0

and check if the node is connected to the cluster:

SHOW STATUS LIKE '%posts_node%';exit;

Rebooting the cluster

If all nodes go down for some reason - either unexpected or planned, bringing back the cluster is easy, but requires a bit of attention.

To reinstate a cluster, it needs to be bootstraped by its most advanced node - which should be, in general, the last node that went off. The information we need for this is stored in the 'grastate.dat' file from the 'data_dir' folder. There are two variables we need to look at : 'safe_to_bootstrap' - the last node to have exited from the cluster should have a value of '1' and 'seqno' number - which should be equal to the highest sequence number.

The node must be started with an option '--new-cluster'. This will inform the node it is going to do a cluster bootstrap.

First, let's stop all the nodes:

/usr/bin/searchd --config /etc/manticoresearch/manticore1.conf --stopwait

/usr/bin/searchd --config /etc/manticoresearch/manticore3.conf --stopwait

/usr/bin/searchd --config /etc/manticoresearch/manticore2.conf --stopwait

If we take a look at the 'grastate.dat' we see the last node (number two) we stopped has all the required conditions:

cat /var/lib/manticore/1/grastate.dat

cat /var/lib/manticore/2/grastate.dat

cat /var/lib/manticore/3/grastate.dat

We start the number two node with '--new-cluster' and the other two nodes in normal way:

/usr/bin/searchd --config /etc/manticoresearch/manticore2.conf --new-cluster

/usr/bin/searchd --config /etc/manticoresearch/manticore3.conf

/usr/bin/searchd --config /etc/manticoresearch/manticore1.conf

If we log on the node one we can see the cluster is back and synced:

mysql -P9306 -h0

SHOW STATUS like '%posts_node%';