Jepsen-testing RabbitMQ [with python] – Part 2

This is the second post about my efforts to reproduce the Jepsen RabbitMQ test (using python). The first one failed to reproduce data loss by cutting the network in half the same way every time. Here I’ll try different partitioning schemes.

Random blockade partitions

First, let’s try the blockade’s random partitions:

$ #!/bin/bash
$ for i in `seq 5`
do 
  echo 'creating partition'
  sudo blockade partition --random
  sudo blockade status
  sleep 60
  echo 'healing partition'
  sudo blockade join
  sudo blockade status
  sleep 60
done

This is implemented as a nemesis in blockade_random_partitions-rabbitmq-test.py.

Running the test and yet again no messages were lost:

[WARNING] (MainThread) RECEIVED: 25000, unexpected: 19. [15042, 20042, 41, 10041, 5042, 10590, 20609, 604, 15630, 16155, 21134, 1134, 11709, 21709, 22240, 2496, 7498, 12279, 17532], LOST MESSAGES 0, []

Jepsen random halves

Let’s try to cut the network into random halves like in the original Jepsen test:

Meanwhile, the nemesis cuts the network into random halves every sixty seconds, then repairs it for sixty seconds, and so on.

This is implemented in random-majority-rabbitmq-test.py

Running this test and yet again no messages were lost.

Baliant Pato’s partitions

In an excellent post, Baliant Pato explained how he managed to reproduce the data loss. Let’s try Baliant Pato’s partitions (in a 3 node cluster: rabbit1, rabbit2, rabbit3 with pause_minority)

Publisher A connecting to rabbit1
partition rabbit3 away from rabbit2 and rabbit1 (now rabbit3 is stale pretty quickly, he’s gonna be the future “stale node”, >evil laughter<.)
partition rabbit1 away from rabbit2 and rabbit3 (at this point the cluster is stopped, the final two minorities are in sync but dead)
partition rabbit2 away from rabbit1 and rabbit3
heal rabbit3 (cluster is still stopped)
heal rabbit2

For this test I need 3 instead of 5 rabbitmq nodes (make a copy of docker-compose.yml and then sudo docker-compose -f docker-compose-3_nodes.yml up). I’ll be using 3 instead of 2 producers, one per rabbitmq node, and a single queue with a default exchange mirrored to all the nodes (“ha-mode”:”all”).

The partitioning scheme is:

sudo blockade partition n1,n2 n3
sudo blockade partition n1 n2 n3
sudo blockade partition n1 n2,n3
sudo blockade join

Implemented as a nemesis in Baliant_Pato_partitioning-rabbitmq-test.py:

t = sorted(nodes)

# partition rabbit3 away from rabbit2 and rabbit1
yield ['%s,%s' % (t[0], t[1]), t[2]]
# partition rabbit1 away from rabbit2 (rabbit3 already away)
yield t[:]
# join rabbit3 with rabbit2:
yield [t[0], '%s,%s' % (t[1], t[2])]
# finally join (all nodes belong to the same partition)
while True:
    yield [','.join(t)]Running the test:

$ sudo python src/Baliant_Pato_partitioning-rabbitmq-test.py 3
...
[ERROR] (rabbit 5674) failed to send 10041, cannot connect to rabbit, stopping the test
[INFO] (rabbit 5672) sent: 785, failed: 1, total: 5000
[ERROR] (rabbit 5673) failed to send 5783, cannot connect to rabbit, stopping the test
[INFO] (rabbit 5673) sent: 783, failed: 1, total: 5000
[INFO] (MainThread) 1609 messages sent, 3 failed
[WARNING] (MainThread) RECEIVED: 1612, unexpected: 0. [], LOST MESSAGES 0, []

rabbit2&rabbit3 are down, producers cannot finish the test.. let’s run one more time:

$ sudo python src/Baliant_Pato_partitioning-rabbitmq-test.py 3
...
[INFO] (MainThread) producers done
[INFO] (MainThread) draining the queue
[INFO] (MainThread) 15000 messages sent, 0 failed
[INFO] (MainThread) rabbitmq client at 192.168.54.136:5672
[INFO] (MainThread) Connecting to 192.168.54.136:5672
[INFO] (MainThread) Created channel=1
[INFO] (MainThread) Connecting to 192.168.54.136:5672
[INFO] (MainThread) Created channel=1
[WARNING] (MainThread) RECEIVED: 5841, unexpected: 1. [799], LOST MESSAGES 9159

This leaves rabbit2 & rabbit3 again down.

# rabbit2 logs: 
...
=WARNING REPORT==== 5-Jan-2017::08:23:18 ===
Mirrored queue 'jepsen.queue' in vhost '/': Stopping all nodes on master shutdown since no synchronised slave is available

=INFO REPORT==== 5-Jan-2017::08:23:25 ===
Stopped RabbitMQ application

=ERROR REPORT==== 5-Jan-2017::08:25:19 ===
** Node rabbit@rabbit1 not responding **
** Removing (timedout) connection **

=ERROR REPORT==== 5-Jan-2017::08:25:33 ===
Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit3}

=INFO REPORT==== 5-Jan-2017::08:25:33 ===
Starting RabbitMQ 3.6.5 on Erlang R16B03-1
Copyright (C) 2007-2016 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/
...

Mirrored queue 'jepsen.queue' in vhost '/': Adding mirror on node rabbit@rabbit3: <32532.2772.0>

hm, rabbit2 synchronized with rabbit3.

osboxes@osboxes:~/jepsen-rabbitmq$ sudo docker exec -it n3 rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit3]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[]},
 {alarms,[{rabbit@rabbit2,[]},{rabbit@rabbit3,[]}]}]
osboxes@osboxes:~/jepsen-rabbitmq$ sudo docker exec -it n2 rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit3,rabbit@rabbit2]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[]},
 {alarms,[{rabbit@rabbit3,[]},{rabbit@rabbit2,[]}]}]
osboxes@osboxes:~/jepsen-rabbitmq$ sudo docker exec -it n1 rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[]},
 {alarms,[{rabbit@rabbit1,[]}]}]
osboxes@osboxes:~/jepsen-rabbitmq$ sudo docker exec -it n2 rabbitmqctl list_queues
Listing queues ...
jepsen.queue 10801

So rabbit2 & rabbit3 think rabbit1 is dead and vice versa. If i connect to rabbit2 and drain the queue, then no messages are lost.

But wasn’t rabbitmq supposed to have detected a partition and report it? As in:

osboxes@osboxes:~/jepsen-rabbitmq$ sudo docker exec -it n1 rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit1">>},
{partitions,[{rabbit@rabbit1,[rabbit@rabbit2,rabbit@rabbit3]}]},
{alarms,[{rabbit@rabbit1,[]}]}]

pause_minority is supposedly an “automatic partition handling” mechanism (excerupt from the rabbitmq.com/partitions.html)

RabbitMQ also offers three ways to deal with network partitions automatically: pause-minority mode, pause-if-all-down mode and autoheal mode.

In pause-minority mode RabbitMQ will automatically pause cluster nodes which determine themselves to be in a minority (i.e. fewer or equal than half the total number of nodes) after seeing other nodes go down. … The minority nodes will pause as soon as a partition starts, and will start again when the partition ends.

If you after reading this assume (like I did) that no intervention is needed then you’re up for a surprise. I ended up here with the rabbit2&rabbit3 nodes reported as being down, and I have to manually restart them.

milansimonovic.com

Jepsen-testing RabbitMQ [with python] – Part 2

Random blockade partitions

Jepsen random halves

Baliant Pato’s partitions

Leave a Reply Cancel reply

software engineer