Feature Request: -about-to-depart hook

Tue Feb 3 14:23:57 UTC 2015

On 28 January 2015 at 21:03, Mario Splivalo
<mario.splivalo at canonical.com> wrote:
> On 01/27/2015 09:52 AM, Stuart Bishop wrote:
>>> Ignoring the, most likely, wrong nomenclature of the proposed hook, what
>>> are your opinions on the matter?
>>
>> I've been working on similar issues.
>>
>> When the peer relation-departed hook is fired, the unit running it
>> knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not
>> be alive - we may be removing a failed unit from the service.
>> $REMOTE_UNIT may be alive but uncontactable - some form of network
>> partition has occurred.
>
> $REMOTE_UNIT doesn't have to be the one leaving the cluster. If I have
> 3-unit cluster (mongodb/0, mongodb/1, mongodb/2), and I 'juju remove
> mongodb/1), the relation-departed hook will fire on all three units.
> Moreover, it will fire twice on mongodb/1. So, from mongodb/2
> perspective, $REMOTE_UNIT is indeed pointing to mongodb/0, which is, in
> this case, leaving the relation. But if we observe the same scenario on
> mongodb/0, $REMOTE_UNIT there will point to mongodb/0. But that unit is
> NOT leaving the cluster. There is no way to know if the hook is running
> on the unit that's leaving or is it running on the unit that's staying.

I see, and have also struck the same problem with the Cassandra charm.
It is impossible to have juju decommission a node.

My relation-departed hook must reset the firewall rules, since the
replication connection is unauthenticated and we cannot leave it open.
This means I cannot decommission the departing unit in the
relation-broken hook, as the remaining nodes refuse to talk to it and
it has no way of redistributing its data.

And I can't decommission the departing node in the relation-departed
hook, because as you correctly say, it is impossible to know which
unit is actually leaving the cluster and which are remaining.

> But, if that takes place in relation-departed, there is no way of
> knowing if you need to do a stepdown, because you don't know if you're
> the unit being removed, or is it the remote unit being removed.
> Therefore the logic for removing nodes had to go to relation-broken.
> But, as you explained, if the unit goes down catastrophically the
> relation-broken will never be executed and I have a cluster that needs
> manual intervention to clean up.

Leadership might provide a work around, as the service is guaranteed
to have exactly one leader. If a unit is running the relation-departed
hook and it is the leader, it knows it is not the one leaving the
cluster (or it would no longer be leader) and it can perform the
decommissioning.

But that is a messy work around. Given we have both struck nearly
exactly the same problem, I'd surmise the same issue will occur in
pretty much all similar systems (Swift, Redis, mysql, ...) and we need
a better solution.

I've also heard rumours of a goal state, which may provide units
enough context to know what is happening. I don't know the details of
this though.

> I'm not sure if this is possible... Once the unit left relation juju is
> no longer aware of it so there is no way of knowing if -broken completed
> with success or not. Or am I wrong here?

Hooks have no way of telling, but juju could in the same way that you
can tell by running 'juju status'. If the unit is still running, it
might still run the -broken hook. Once the unit is destroyed, we know
it will never run the -broken hook.

-- 
Stuart Bishop <stuart.bishop at canonical.com>