Mutant Mongo: A lesson in immutability
As anyone who’s read Joshua Bloch’s Effective Java will know, immutable classes are generally considered to be a good thing. It’s easy to read such arguments and agree, but it’s always useful to see real world examples to back it up. And, as you may have already guessed, we recently came across such a case.
Background
At work we use MongoDB for most datasets held in Atlas at the moment, and for resilience and scalability we have a replica set in place.
Application servers that handle reads from the API use a secondary database, while writes happen on a primary database. Adding POST functionality to the API complicated matters a little as API write operations need to be performed against the primary. It’s not just the write operation itself, but associated reads for the likes of equivalence.
how may i direct your query?
The standard Mongo Java driver, which we use, allows you to specify a ReadPreference
at various levels: connection, database, collection and, finally, query.
In the case of POSTs we wanted to control the behaviour at the DBCollection
level, which would allow us to overwrite the default of reading from a secondary in particular cases. Our code went something along the lines of this:
All well and good. Or so we thought. I was using mongotop to look into something unrelated and noticed a large number of reads on that there lookup
collection hitting the primary database. On closer inspection, running mongotop
on a secondary revealed no reads from lookup
. Curious.
The documentation for setReadPreference
on DBCollection
states that it
Sets the read preference for this collection.
Now we read that as setting the read preference for this instance of DBCollection, and because we were getting the collection fresh from a DB
instance, we thought it would only affect this instance of DBCollection
and therefore use of the lookup
collection.
That’s correct, but with a massive caveat. After a little digging, it turns that said instance is shared. In DBApiLayer
we have:
Ah! A cache of DBCollection
objects. Re-reading the Javadoc for setReadPreference
and, knowing this code, I now interpret it somewhat differently.
The fix was easy: we’ve pushed the setting of read preference all the way down to the query level. It does break encapsulation a little; the MongoEquivalenceStore
class now needs to know all about which database it’s reading from, something that was previously hidden from it.
No change, please
The documentation could be updated to reflect this object pooling, but that should really be a hidden implementation detail. The real fix would be to make instances of DBCollection
immutable. I then wouldn’t be surprised by the behaviour of DBCollection
, as it wouldn’t change after instantiation. Separately, I don’t see why a pool of DBCollection
objects—and DB
objects in the Mongo
class for that matter—is needed in the first place as it doesn’t look like it’s expensive to create a new instance of a DBCollection
, but that’s a side point1.
We’d love to produce a patch to the Mongo Java library to fix this. However, a change would clearly not be backwardly-compatible and others may be relying on the behaviour. We’ll be talking to the mongo folks about how things could be improved without breaking things.
1 Unfortunately the history of the driver source doesn’t go back far enough to see if there are any clues as to why.
The banner image is a cropped version of an image by David Goehring and used under a Creative Commons license.
blog comments powered by Disqus