Smashing CouchBase

You might remember year 2k for Bill Gates stepping down as CEO (and promoting Steve Balmer), American Beauty winning 5 Oscars, liberation of Lebanon after 22 years of Israeli occupation, end of RSA patent, Bill Clinton becoming the first U.S. president to visit Vietnam, and I will certainly remember it for the Bulldozer Revolution in my country which lead to the resignation of Slobodan Miloševi?

cloud / big dataThat year marks a big milestone in the world of software engineering: in a conference keynote, prof. Eric Brewer postulated the now famous and much discussed CAP theorem (btw it was proven 2 years later by Gilbert and Lynch), and paved the way for the NoSQL revolution.

Another big factor for NoSQL was getting away from the relational model (proposed way back in 1969) and ACID to achieve scale, performance. Having felt the pain of working with object-relational mappers, developing a webapp on MongoDB for the first time was such a pleasant experience.

Unfortunately (most of) the current NoSQL systems (read MongoDB, Riak, Cassandra, …) force developers to deal with messy aspects of consistency, concurrency, non-zero latencies, weird corner cases of failure scenarios, not to mention designing CRDTs, or asking some philosophical questions like how-eventual-is-eventual-consistency (there’re some promising developments like Google’s Spanner and HyperDex, but haven’t had time to have a look).

I’ve successfully managed to avoid dealing with these thorny issues for a few years now, but this fantastic series of articles called Jepsen by Kyle Kingsbury got me motivated to address these using CouchBase. His series of articles is not only eye-opening from the developer’s point of view, but it also reveals some shocking discoveries, namely some databases claim things they don’t provide (Redis being a CP system, MongoDB being safe, etc).

I’m not smart enough to prove that a database is broken by design, and even if I was it would require more time than I afford to invest, so Kyle’s experimental approach is the preferred way.  Since I don’t speak Clojure, the implementation language of his choice for Jepsen, this will be done in Java (yeah, I know, it’s not that cool anymore, but it gets the job done).

Cluster setup

The first cluster setup was just like Kyle’s using LXC, but I ran into some networking issues, one node would just deny connection attempts, so I ended up using 5 VMs on my Mac laptop.

For a detailed explanation of what and how this test is supposed to work have a look at http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions. Here’s a quick recap:

A five node cluster of Ubuntu 14.04 machines, running in VMs (VmWare or VirtualBox). VMs – 192.168.1.131 – 192.168.1.135 (named n1-n5 – add these entries to /etc/hosts on your computer and on each of the nodes themselves).

To cause a partition, Kyle makes “nodes n1 and n2 to drop IP traffic from n3, n4, and n5–essentially by running”:

In my experiments, I wanted to have ssh access all the time on all VMs so I drop a port range that covers all CouchBase ports (they should be above 1024), like this:

To resolve the partition, we’ll reset iptables chains to their defaults:

A client will be a simple app that appends integers to a single document – read & append number & use cas to update, and record if the db acknowledged the write or there was an error. At the end I’ll compare if the db and client apps agree on the state, or acknowledged writes are missing and/or unacknowledged writes are present.

A simple time-out test first

Before going with the full cluster, let’s test the code against a single node cluster running CouchBase 2.5.1: 256MB ram; default bucket is type Couchbase, 256MB ram quota, 0 replicas, 2 workers, flush enabled.

The client app will use Java SDK 1.4.3 and Oracle JDK 1.7.0_55.

CouchTimeoutTest first opens a connection to CouchBase – CouchbaseClient with the OP timeout of 10sec, sets an empty JSON document with a single array element, appends 50 numbers to the array, then makes the node drop all traffic for CouchBase ports, and tries to add 150 more elements to the array:

All operations are synchronous. The Java SDK driver docs describe how to deal with time-outs: “When a time-out occurs, most of the synchronous methods on the client return a RuntimeException showing a time-out as the root cause.”, so the appendToDocument implementation looks like this:

Test result

The client is configured to timeout after 10sec, and the partition lasts for 15s so I expect to see some timeouts and retries. In the end the document should have 200 integers in the array. Manually checking the iptables rules I see the code correctly heals the node. What do you think happened when I ran this code (using maven exec plugin, mvn test -Pcouch-timeout)?

  1. a few operations timed out and then succeeded
  2. a few operations timed out and then failed
  3. CouchbaseClient failed to reconnect so only 50 integers were written
  4. an infinite loop

Believe it or not it’s the 4th option. CouchbaseClient never timeouts or throws anything, just keeps running forever broadcasting noops to probably keep the connection alive:

The cluster is healthy, node up and running; if I spin another client it reads/writes without any problems.

What happens if the server is unavailable for some amount of time less than the configured timeout (10sec in this test)? The magic number seems to be 7, for anything longer the test deadlocks.

The SDK docs are pretty clear in that “when a time-out occurs, most of the synchronous methods on the client return a RuntimeException“.

It turned out (thanks Michael) the problem was the test code – getting the doc was outside any try/catch and somehow the exceptions were getting swallowed (not even showing in the log?!). The fix is:

couch_timeout_test_bugfix

With this change, the test runs as expected, a few ops time out but eventually all succeed:

The next article will explore how CouchBase behaves when this test runs against a 5 node cluster.

Tagged on: , ,