Wish: configurable juju's write concern when connecting to mongd

Tue Jan 12 10:43:29 UTC 2016

On Mon, Jan 11, 2016 at 5:14 PM, Mario Splivalo <
mario.splivalo at canonical.com> wrote:

> Hello.
>
> I have a customer that ended up with broken juju database - the issue
> and possible cause is explained here: http://pad.lv/1528261
>
> I don't have enough data to verify what exactly happened, but most
> likely something like this happened:
>
> 1. jujud wrote to PRIMARY, with write concern of 'majority'
> 2. as SECONDARYs are lagged (usually the customers run them inside VMs),
> only when changes replicated to ONE of the SECONDARYes, mongod returned
> 'all good, carry on' to jujud
> 3. PRIMARY lost connectivity with the rest of the replicaset.
> 4. SECONDARYs decided to vote for the new PRIMARY - the SECONDAY which
> haven't had all the data replicated to it was choosen as new PRIMARY.
>

This specific scenario was once a problem but should be fine in 2.4.6 [0].

> 5. former-PRIMARY joins the replicaset, and destroys (rollbacks) all of
> the unreplicated changes
>
> And now we have a situation that juju thinks that the data is written to
> the database, where in fact that data doesn't exists.
>

There is, however, still a mechanism whereby parts of juju can think a
value has been written when it actually hasn't [1].

Now, if we could tell juju to use 'writeconcern' of 3, situation like
> above wouldn't happen, as we are always sure that all the data changes
> are written to all of the servers (assuming, of course, we run
> replicaset with three nodes).
> In the event of one server going down, writes to mongodb would stop,

This would make it hard to be HA...

> as
> there are now only two servers to write to, and we are asking mongo to
> confirm writes to three servers. But, we are safe, data-wise, and no
> data will be lost.
>

...and I don't *think* we'd fully address the read-uncommitted problem even
if we could work around the loss of HA mongo.

With the option to rec-configure jujud to use write-concern of 2, we
> could re-enable writes to the mongod, at least until we bring back
> broken SECONDARY back to life.
>
> Does this makes any sense?
>

I see where you're coming from, but I don't think it'd help. AFAICS the
problem isn't that writes are acked too early; it's that concurrent
flushers can read uncommitted transaction data; and record references to
those transactions; which could then end up dangling.

I'm asking this because in real-life deployments situation as described
> above is not so uncommon. Especially when jujud state servers (and their
> mongodb nodes) are inside VMs, there is a possibility that the slaves
> would lag behind the master.
>

In practice, it seems that the bad behaviour is triggered by having "too
many" concurrent txns touching the same document. The assert-address txns
referenced in the bug were particularly serious offenders there, and have
long since been fixed; but all the same, I'd like to remind all developers
who need to touch state that their txn footprints should be as dainty as
possible. This is not the first time that enthusiastically-overlapping
transactions have given us grief.

Cheers
William

[0] https://aphyr.com/posts/284-jepsen-mongodb#comment-602
[1] https://aphyr.com/posts/322-jepsen-mongodb-stale-reads

>         Mario
>
> --
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20160112/eb80b6ca/attachment.html>