Master-master | Tarantool

Master-master

Example on GitHub: master_master

This tutorial shows how to configure and work with a master-master replica set.

Before starting this tutorial:

  1. Install the tt utility.

  2. Create a tt environment in the current directory by executing the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the master_master directory.

  4. Inside instances.enabled/master_master, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment and should look like this:

      instance001:
      instance002:
      
    • The config.yaml file is intended to store a replica set configuration.

This section describes how to configure a replica set in config.yaml.

First, set the replication.failover option to off:

replication:
  failover: off

Define a replica set topology inside the groups section:

  • The database.mode option should be set to rw to make instances work in read-write mode.
  • The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: off

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

  1. After configuring a replica set, execute the tt start command from the tt environment directory:

    $ tt start master_master
       • Starting an instance [master_master:instance001]...
       • Starting an instance [master_master:instance002]...
    
  2. Check that instances are in the RUNNING status using the tt status command:

    $ tt status master_master
    INSTANCE                      STATUS      PID
    master_master:instance001     RUNNING     30818
    master_master:instance002     RUNNING     30819
    

  1. Connect to both instances using tt connect. Below is the example for instance001:

    $ tt connect master_master:instance001
       • Connecting to the instance...
       • Connected to master_master:instance001
    
  2. Check that both instances are writable using box.info.ro:

    • instance001:

      master_master:instance001> box.info.ro
      ---
      - false
      ...
      
    • instance002:

      master_master:instance002> box.info.ro
      ---
      - false
      ...
      
  3. Execute box.info.replication to check a replica set status. For instance002, upstream.status and downstream.status should be follow.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 7
        upstream:
          status: follow
          idle: 0.21281599999929
          peer: replicator@127.0.0.1:3302
          lag: 0.00031614303588867
        name: instance002
        downstream:
          status: follow
          idle: 0.21800899999653
          vclock: {1: 7}
          lag: 0
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 0
        name: instance001
    ...
    

    To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

Note

Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

To check that both instances get updates from each other, follow the steps below:

  1. On instance001, create a space, format it, and create a primary index:

    box.schema.space.create('bands')
    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    box.space.bands:create_index('primary', { parts = { 'id' } })
    

    Then, add sample data to this space:

    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    
  2. On instance002, use the select operation to make sure data is replicated:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
    ...
    
  3. Add more data to the created space on instance002:

    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    
  4. Get back to instance001 and use select to make sure new records are replicated.

  5. Check that box.info.vclock values are the same on both instances:

    • instance001:

      master_master:instance001> box.info.vclock
      ---
      - {2: 5, 1: 9}
      ...
      
    • instance002:

      master_master:instance002> box.info.vclock
      ---
      - {2: 5, 1: 9}
      ...
      

Note

To learn how to fix and prevent replication conflicts using trigger functions, see Resolving replication conflicts.

To insert conflicting records to instance001 and instance002, follow the steps below:

  1. Stop instance001 using the tt stop command:

    $ tt stop master_master:instance001
    
  2. On instance002, insert a new record:

    box.space.bands:insert { 5, 'incorrect data', 0 }
    
  3. Stop instance002 using tt stop:

    $ tt stop master_master:instance002
    
  4. Start instance001 back:

    $ tt start master_master:instance001
    
  5. Connect to instance001 and insert a record that should conflict with a record already inserted on instance002:

    box.space.bands:insert { 5, 'Pink Floyd', 1965 }
    
  6. Start instance002 back:

    $ tt start master_master:instance002
    

    Then, check box.info.replication on instance001. upstream.status should be stopped because of the Duplicate key exists error:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 9
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 143.52251672745
          status: stopped
          idle: 3.9462469999999
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data", 0]
        name: instance002
        downstream:
          status: stopped
          message: 'unexpected EOF when reading from socket, called on fd 12, aka 127.0.0.1:3301,
            peer of 127.0.0.1:59258: Broken pipe'
          system_message: Broken pipe
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 6
        name: instance001
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status on a new master

To resolve a replication conflict, instance002 should get the correct data from instance001 first. To achieve this, instance002 should be rebootstrapped:

  1. In the config.yaml file, change database.mode of instance002 to ro:

    instance002:
      database:
        mode: ro
    
  2. Reload configurations on both instances using the reload() function provided by the config module:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  3. Delete write-ahead logs and snapshots stored in the var/lib/instance002 directory.

    Note

    var/lib is the default directory used by tt to store write-ahead logs and snapshots. Learn more from Configuration.

  4. Restart instance002 using the tt restart command:

    $ tt restart master_master:instance002
    
  5. Connect to instance002 and make sure it received the correct data from instance001:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
      - [5, 'Pink Floyd', 1965]
    ...
    

After reseeding a replica, you need to resolve a replication conflict that keeps replication stopped:

  1. Execute box.info.replication on instance001. upstream.status is still stopped:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 9
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 143.52251672745
          status: stopped
          idle: 1309.943383
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data",
            0]
        name: instance002
        downstream:
          status: follow
          idle: 0.47881799999959
          vclock: {2: 6, 1: 9}
          lag: 0
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 6
        name: instance001
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status after reseeding a replica
  2. In the config.yaml file, clear the iproto option for instance001 by setting its value to {} to disconnect this instance from instance002. Set database.mode to ro:

    instance001:
      database:
        mode: ro
      iproto: {}
    
  3. Reload configuration on instance001 only:

    master_master:instance001> require('config'):reload()
    ---
    ...
    
  4. Change database.mode values back to rw for both instances and restore iproto.listen for instance001:

    instance001:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3301'
    instance002:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3302'
    
  5. Reload configurations on both instances one more time:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  6. Check box.info.replication. upstream.status be follow now.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 9
        upstream:
          status: follow
          idle: 0.21281300000192
          peer: replicator@127.0.0.1:3302
          lag: 0.00031113624572754
        name: instance002
        downstream:
          status: follow
          idle: 0.035179000002245
          vclock: {2: 6, 1: 9}
          lag: 0
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 6
        name: instance001
    ...
    

The process of adding instances to a replica set and removing them is similar for all failover modes. Learn how to do this from the Master-replica: manual failover tutorial:

Before removing an instance from a replica set with replication.failover set to off, make sure this instance is in read-only mode.

Found what you were looking for?
Feedback