Mutant Mongo: A lesson in immutability

31 October 2013

As anyone who’s read Joshua Bloch’s Effective Java will know, immutable classes are generally considered to be a good thing. It’s easy to read such arguments and agree, but it’s always useful to see real world examples to back it up. And, as you may have already guessed, we recently came across such a case.

Background

At work we use MongoDB for most datasets held in Atlas at the moment, and for resilience and scalability we have a replica set in place.

Application servers that handle reads from the API use a secondary database, while writes happen on a primary database. Adding POST functionality to the API complicated matters a little as API write operations need to be performed against the primary. It’s not just the write operation itself, but associated reads for the likes of equivalence.

how may i direct your query?

The standard Mongo Java driver, which we use, allows you to specify a ReadPreference at various levels: connection, database, collection and, finally, query.

In the case of POSTs we wanted to control the behaviour at the DBCollection level, which would allow us to overwrite the default of reading from a secondary in particular cases. Our code went something along the lines of this:

All well and good. Or so we thought. I was using mongotop to look into something unrelated and noticed a large number of reads on that there lookup collection hitting the primary database. On closer inspection, running mongotop on a secondary revealed no reads from lookup. Curious.

The documentation for setReadPreference on DBCollection states that it

Sets the read preference for this collection.

Now we read that as setting the read preference for this instance of DBCollection, and because we were getting the collection fresh from a DB instance, we thought it would only affect this instance of DBCollection and therefore use of the lookup collection.

That’s correct, but with a massive caveat. After a little digging, it turns that said instance is shared. In DBApiLayer we have:

Ah! A cache of DBCollection objects. Re-reading the Javadoc for setReadPreference and, knowing this code, I now interpret it somewhat differently.

The fix was easy: we’ve pushed the setting of read preference all the way down to the query level. It does break encapsulation a little; the MongoEquivalenceStore class now needs to know all about which database it’s reading from, something that was previously hidden from it.

No change, please

The documentation could be updated to reflect this object pooling, but that should really be a hidden implementation detail. The real fix would be to make instances of DBCollection immutable. I then wouldn’t be surprised by the behaviour of DBCollection, as it wouldn’t change after instantiation. Separately, I don’t see why a pool of DBCollection objects—and DB objects in the Mongo class for that matter—is needed in the first place as it doesn’t look like it’s expensive to create a new instance of a DBCollection, but that’s a side point¹.

We’d love to produce a patch to the Mongo Java library to fix this. However, a change would clearly not be backwardly-compatible and others may be relying on the behaviour. We’ll be talking to the mongo folks about how things could be improved without breaking things.

¹_{Unfortunately the history of the driver source doesn’t go back far enough to see if there are any clues as to why.}

_{The banner image is a cropped version of an image by David Goehring and used under a Creative Commons license.}

coding 3