man jumps on sofa

Tasting first CouchBase

Abstract

In this post I’ll tell you about installing and using CouchBase in a test-first manner. Wiping out all the data between tests turned out to be very slow – 4s per test [MB-7965], and for anything but key lookups indexes have to be used, so the elephant is still in the room. CouchBase’s async nature makes setting up testing infrastructure a bit more difficult compared to MongoDB or relational databases.

Motivation

I’ve used MongoDB for a couple of projects at work (http://pax-db.org); it’s great for read-only data, logs and analytics.

It turned out to be rather difficult to operate (puppetize, coordinating replicasets initialization, watching oplog, …); consistent hashing over a set of equal nodes sounds like it’s more likely to work reliably than async master-slave (plus seems more efficient – mongo secondaries are not used to serve content); sharding+replica sets is way more painful than elastically growing (and shrinking!) a dynamo-like system; plus mongodb can and will lose data (under network partitions).

The document model seems like a good fit, so what’s else out there, and doesn’t suck: CouchBase.

Installing CouchBase

In a virtual box (IP 192.168.1.128), just in case…

 

Setup

(mental note: think about puppet-izing ..)

Why do I have to create a default bucket!?

The upcoming Java SDK v2.0 looks very neat, reactive and async, with Java 8 lambdas, but until then need to stick with the latest stable v1.4.2 now.

Using Oracle jdk1.7.0_55.jdk on Mac OS X Mavericks.

Test first

Hello couchbase in java works fine. Encouraging, let’s try to implement a CRUD for a simple class (having a few primitive fields). Spring-data already can provide this but before automating something it’s usually a good idea to do it manually first. Plus spring-data mongodb hurt me once, but more on that in another post.

It starts with a test of course. Need to open a client first (hm, is CouchbaseClient thread-safe, has it got connection pooling under the hood, … the API docs don’t say anything, but the executorService field suggests it does have pooling; mental note: read the source code..). Docs v2.0 say Java SDK is thread safe (the C lib is not): “Couchbase Java and.Net SDKs have been certified as being thread-safe if you utilize thread-safety mechanisms provided by the two languages with the SDKs”. Alright, I can now sleep peacefully.

The documentation is somewhat confusing, http://docs.couchbase.com/, some sections only exist in 2.0, others in 2.1…
For example: thread-safety in sdk exists in v2.0, but not in v2.1 and v2.2. Wish there was a PDF version…

Back to the unit test, open a client in @BeforeClass, lets use a test bucket, so I don’t by mistake mess up the production data. To have tests isolated and repeatable (as defined in F.I.R.S.T), lets create/delete bucket before each test, using ClusterManager#createNamedBucket (introduced in v1.1), and run the test:

creating the CouchbaseClient on the 5th line throws an exception:

A forum thread says it’s probably a timing issue, couch is still not done creating the bucket. The mentioned links to BucketTool.java and ViewTest.java are 404s, so lets ask my friend git:

 

Removed in ec02294, let’s go one rev before that one – 95569c4: BucketTool.java, ViewTest.java. The code looks like mine, creating a (default) bucket first then calling initClient. A dead end.

One of the blog posts mentioned something about setting timeouts, lets see:

Nope, still the same error. Going through the couch docs, I noticed the cli tool has an option to wait:

Hm, wonder how is that implemented..

Pooling bucket-list until it’s created. Seems like there’s no other way but to sleep:

The bar is green, but it makes the code either brittle or slow depending on the timeout: if too big the tests are slow, if too small the tests can fail, and it probably depends on the hardware, network, setup… waitForWarmup is not making the difference since it works once the CouchbaseClient is created.

How about trying a few times until it succeeds?

Green bar again, took 4s, with the message: “managed to open the client in 5 attempts”. I swear 500 was just a lucky guess, you can’t make this shit up :-)

Lets add one more test to see how Fast this is:

unit test console log

Oops, F just went out of window, it’s super slow, took 8 secs without any ops against the database.

Ok, how about creating bucket once per class and then calling flush/flushBucket to delete all data:

TestConnTime2

Nope, now it took almost 30sec (note: clusterManager.flushBucket(TEST_BUCKET) does not make any difference):

Hm, for sure I’m not the first one to try this, let’s see how other are doing this. Let’s see if there’s any github repo that does this: https://github.com/search?q=createNamedBucket&ref=cmdform&type=Code only 3 in java as of the writing this post, none doing unit testing…

Maybe the repos are private… Lets search the forums. There’s this post from Mike, a CouchBase engineer:

I can see that after trying flush you then tried to do the recommended thing and delete and recreate the bucket, but your running into a few issues with runtime exceptions. The only time I have run into this issue is when I have been creating and deleting buckets really fast. For example, when we switched our unit tests off of flush and moved to deleting and recreating buckets we were doing this process many times per second.

By “really fast” Mike obviously means way more than “many times per second”, but I’d happily take even once or twice per second. ViewTest.java used to create buckets once per class, but never used flush:

What options do I have now? Instead of creating/deleting buckets or using flush(), think I’ll have to manually delete all items in the bucket. This turned out to be fast enough.

With the test setup in place, let’s spin up that fail/pass/refactor wheel.

CRUD class

I usually start with this test when implementing a CRUD:

If you object that testing save() using count() isn’t true to TDD’s principles, let’s say I’m just being pragmatic. Can you find a case where this logic fails to catch a bug?

I like to start with an empty table/collection/bucket, run the test, and don’t clean up afterwards, so it’s easy to manually verify if the data is really there.

How do I count number of items in a bucket? countdistinct-keys-using-mapreduce and basic-couchbase-querying-for-sql-people explain.

Have to make a view for that… which means I need to figure out how to test views first. couchbase-101-create-views-mapreducecouchbase-101-create-views-mapreduce-from-your-java-application.

Boils down to creating a DesignDocument, adding a View and saving the design doc. Sounds simple.

Red bar, move on to implement createViews(). As of SDK 1.4 ViewResponse exposes the total number of rows in the view. Since we’ll be storing all kinds of different document (classes, think BlogPost, Comment, …) in a single bucket, views have to make sure they onloy index the intended document types, and emit document ids:

Green bar, lets implement the count() method:

Run the test again, unexpectedly it fails again in client.query():

The view exists – client.getView() returns non null object, but query fails. Googling again: http://stackoverflow.com/q/24306216. Ahhh, timing.. I don’t mind async workflows, but async workflow without callbacks, that’s annoying. No answers on the SO question, more googling… new-views-are-immediately-available

The view is created, but then the server needs to index all existing docs and that takes some time, which is funny since here the bucket is empty. Anyhow, it’s not instantaneous, need to wait. Back to looping. Adapting the exponential backoff code sample given at Couchbase+Java+Client+Library (one more interested thing to test). Run the test again and now it passes.

Implementing save()… first JSON encoding (using Gson). Hm, need to add this type file which is not present as a field. StackOverflow proves useful again.

hm, this needs testing.

Fails with the description field. Debugger to help: looks like some encoding thing. http://stackoverflow.com/a/11361318/306042GsonBuilder#disableHtmlEscaping() solved the problem.

Back to save(), need to properly code around future.get(), handle exceptions and thread interrups.. and it works.

Going to the admin console http://192.168.1.128:8091/index.html#sec=buckets, and browsing the test bucket, I can see one document:

 

Conclusion – how much I miss rich query API..

Compared to MongoDB’s collection#save() & collection#count() and rich query API, CouchBase requires more work (including reasoning about getting stale data – Query#setStale(Stale)). Maybe spring-data can help generate CRUD repos, and ElasticSearch for rich query API (until N1QL is ready..).

  • http://www.nitschinger.at Michael Nitschinger

    Hi Milan,

    first thanks very much for the honest feedback and spending the time to write a blog post like this. I’m the maintainer of the Java SDK and always looking for stuff like this.

    second, we’re aware of the suboptimal flush performance and the maybe tedious process during testing. We’re trying to improve that process in the 2.0 SDKs with more natural flows and observables. (for example the flush in the new SDK creates a marker document, then flushes and then polls until its gone), so not much work to do on your side other than wait until the observable is completed. We’re trying to get to the same behaviour for design document management so it will get easier. In parallel, the server team is looking into decreasing the time it takes for flush and all related components on the server side.

    Btw, the BucketTool is still there on the release14 branch (because master is now 2.0 land) and here you can see the warmup check: https://github.com/couchbase/couchbase-java-client/blob/release14/src/test/java/com/couchbase/client/BucketTool.java#L153

    If you have anything you’d like to see improved, please raise a ticket on JIRA directly (http://www.couchbase.com/issues/browse/JCBC) and we’ll take it from there.

    Thanks again for the post!
    Michael

    • mbsimonovic

      Hey Michael, these are minor issues, I’m sure you’ll eventually fix all of ’em, especially now with the new grant (congrats 😉 keep up the good work, can’t wait to see SDK 2.0 out