Master-replica: manual failover
Example on GitHub: manual_leader
This tutorial shows how to configure and work with a replica set with manual failover.
Before starting this tutorial:
Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the
instances.enabled
directory of the created tt environment, create themanual_leader
directory.Inside
instances.enabled/manual_leader
, create theinstances.yml
andconfig.yaml
files:instances.yml
specifies instances to run in the current environment and should look like this:instance001: instance002:
The
config.yaml
file is intended to store a replica set configuration.
This section describes how to configure a replica set in config.yaml
.
Define a replica set topology inside the groups section:
- The leader option sets
instance001
as a replica set leader. - The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
In the credentials section, create the replicator
user with the replication
role:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
Set iproto.advertise.peer to advertise the current instance to other replica set members:
iproto:
advertise:
peer:
login: replicator
The resulting replica set configuration should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
After configuring a replica set, execute the tt start command from the tt environment directory:
$ tt start manual_leader • Starting an instance [manual_leader:instance001]... • Starting an instance [manual_leader:instance002]...
Check that instances are in the
RUNNING
status using the tt status command:$ tt status manual_leader INSTANCE STATUS PID manual_leader:instance001 RUNNING 15272 manual_leader:instance002 RUNNING 15273
Connect to
instance001
using tt connect:$ tt connect manual_leader:instance001 • Connecting to the instance... • Connected to manual_leader:instance001
Make sure that the instance is in the
running
state by executing box.info.status:manual_leader:instance001> box.info.status --- - running ...
Check that the instance is writable using
box.info.ro
:manual_leader:instance001> box.info.ro --- - false ...
Execute
box.info.replication
to check a replica set status. Forinstance002
,upstream.status
anddownstream.status
should befollow
.manual_leader:instance001> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 7 name: instance001 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 0 upstream: status: follow idle: 0.3893879999996 peer: replicator@127.0.0.1:3302 lag: 0.00028800964355469 name: instance002 downstream: status: follow idle: 0.37777199999982 vclock: {1: 7} lag: 0 ...
To see the diagrams that illustrate how the
upstream
anddownstream
connections look, refer to Monitoring a replica set.
To check that a replica (instance002
) gets all updates from the master, follow the steps below:
On
instance001
, create a space and add data as described in CRUD operation examples.Open the second terminal, connect to
instance002
usingtt connect
, and use theselect
operation to make sure data is replicated.Check that box.info.vclock values are the same on both instances:
instance001
:manual_leader:instance001> box.info.vclock --- - {1: 21} ...
instance002
:manual_leader:instance002> box.info.vclock --- - {1: 21} ...
Note
Note that a
vclock
value might include the0
component that is related to local space operations and might differ for different instances in a replica set.
This section describes how to add a new replica to a replica set.
Add
instance003
to theinstances.yml
file:instance001: instance002: instance003:
Add
instance003
with the specifiediproto.listen
option to theconfig.yaml
file:groups: group001: replicasets: replicaset001: leader: instance001 instances: instance001: iproto: listen: - uri: '127.0.0.1:3301' instance002: iproto: listen: - uri: '127.0.0.1:3302' instance003: iproto: listen: - uri: '127.0.0.1:3303'
Open the third terminal to work with a new instance. Start
instance003
usingtt start
:$ tt start manual_leader:instance003 • Starting an instance [manual_leader:instance003]...
Check a replica set status using
tt status
:$ tt status manual_leader INSTANCE STATUS PID manual_leader:instance001 RUNNING 15272 manual_leader:instance002 RUNNING 15273 manual_leader:instance003 RUNNING 15551
After you added instance003
to the configuration and started it, you need to reload configurations on all instances.
This is required to allow instance001
and instance002
to get data from the new instance in case it becomes a master.
Connect to
instance003
usingtt connect
:$ tt connect manual_leader:instance003 • Connecting to the instance... • Connected to manual_leader:instance001
Reload configurations on all three instances using the
reload()
function provided by the config module:instance001
:manual_leader:instance001> require('config'):reload() --- ...
instance002
:manual_leader:instance002> require('config'):reload() --- ...
instance003
:manual_leader:instance003> require('config'):reload() --- ...
Execute
box.info.replication
to check a replica set status. Make sure thatupstream.status
anddownstream.status
arefollow
forinstance003
.manual_leader:instance001> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 21 name: instance001 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 0 upstream: status: follow idle: 0.052655000000414 peer: replicator@127.0.0.1:3302 lag: 0.00010204315185547 name: instance002 downstream: status: follow idle: 0.09503500000028 vclock: {1: 21} lag: 0.00026917457580566 3: id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 upstream: status: follow idle: 0.77522099999987 peer: replicator@127.0.0.1:3303 lag: 0.0001838207244873 name: instance003 downstream: status: follow idle: 0.33186100000012 vclock: {1: 21} lag: 0 ...
This section shows how to perform manual failover and change a replica set leader.
In the
config.yaml
file, change the replica set leader frominstance001
tonull
:replicaset001: leader: null
Reload configurations on all three instances using config:reload() and check that instances are in read-only mode. The example below shows how to do this for
instance001
:manual_leader:instance001> require('config'):reload() --- ... manual_leader:instance001> box.info.ro --- - true ... manual_leader:instance001> box.info.ro_reason --- - config ...
Make sure that box.info.vclock values are the same on all instances:
instance001
:manual_leader:instance001> box.info.vclock --- - {1: 21} ...
instance002
:manual_leader:instance002> box.info.vclock --- - {1: 21} ...
instance003
:manual_leader:instance003> box.info.vclock --- - {1: 21} ...
Change a replica set leader in
config.yaml
toinstance002
:replicaset001: leader: instance002
Reload configuration on all instances using config:reload().
Make sure that
instance002
is a new master:manual_leader:instance002> box.info.ro --- - false ...
Check replication status using
box.info.replication
.
This section describes the process of removing an instance from a replica set.
Before removing an instance, make sure it is in read-only mode. If the instance is a master, perform manual failover.
Clear the
iproto
option forinstance003
by setting its value to{}
:instance003: iproto: {}
Reload configurations on
instance001
andinstance002
:instance001
:manual_leader:instance001> require('config'):reload() --- ...
instance002
:manual_leader:instance002> require('config'):reload() --- ...
Check that the
upstream
section is missing forinstance003
by executingbox.info.replication[3]
:manual_leader:instance001> box.info.replication[3] --- - id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 downstream: status: follow idle: 0.4588760000006 vclock: {1: 21} lag: 0 name: instance003 ...
Stop
instance003
using the tt stop command:$ tt stop manual_leader:instance003 • The Instance manual_leader:instance003 (PID = 15551) has been terminated.
Check that
downstream.status
isstopped
forinstance003
:manual_leader:instance001> box.info.replication[3] --- - id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 downstream: status: stopped message: 'unexpected EOF when reading from socket, called on fd 27, aka 127.0.0.1:3301, peer of 127.0.0.1:54185: Broken pipe' system_message: Broken pipe name: instance003 ...
Remove
instance003
from theinstances.yml
file:instance001: instance002:
Remove
instance003
fromconfig.yaml
:instances: instance001: iproto: listen: - uri: '127.0.0.1:3301' instance002: iproto: listen: - uri: '127.0.0.1:3302'
Reload configurations on
instance001
andinstance002
:instance001
:manual_leader:instance001> require('config'):reload() --- ...
instance002
:manual_leader:instance002> require('config'):reload() --- ...
To remove an instance from the replica set permanently, it should be removed from the box.space._cluster system space:
Select all the tuples in the
box.space._cluster
system space:manual_leader:instance002> box.space._cluster:select{} --- - - [1, '9bb111c2-3ff5-36a7-00f4-2b9a573ea660', 'instance001'] - [2, '4cfa6e3c-625e-b027-00a7-29b2f2182f23', 'instance002'] - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003'] ...
Delete a tuple corresponding to
instance003
:manual_leader:instance002> box.space._cluster:delete(3) --- - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003'] ...
Execute
box.info.replication
to check the health status:manual_leader:instance002> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 21 upstream: status: follow idle: 0.73316000000159 peer: replicator@127.0.0.1:3301 lag: 0.00016212463378906 name: instance001 downstream: status: follow idle: 0.7269320000014 vclock: {2: 1, 1: 21} lag: 0.00083398818969727 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 1 name: instance002 ...