relation departure timing changes

Fri Aug 23 10:37:39 UTC 2013

On Fri, Aug 23, 2013 at 8:06 AM, Stuart Bishop
<stuart.bishop at canonical.com>wrote:

> On Thu, Aug 22, 2013 at 11:42 PM, William Reade
> <william.reade at canonical.com> wrote:
>
> > I'm not aware of any reason this measure might be controversial (please
> let
> > me know if you are); but it raises an interesting question whose answer
> > hinges on common practice across the charm community. So far, there has
> been
> > no practical distinction between relation providers and requirers; we're
> > considering introducing an asymetry in the relationship, such that
> providers
> > signal departure early as above (but requirers continue to signal
> departure
> > only once they have actually departed).
>
> I need to see this in more concrete terms.
>
> So lets say we have a provider (server_service), a requirer
> (client_service), and they are related (relname).
>
> Currently, if we remove-unit server_service/0 there is a chance that
> the the unit is removed while the client_service is still using it,
> causing errors. In addition, if we remove-unit client_service/0 there
> is a chance that server_service revokes access before the client unit
> is shutdown, causing errors.
>

Currently, we *guarantee* that destroying server_service/0 will cause its
relationship with client_service units to be torn down before those units
become aware. At this stage, I'm narrowly proposing that we move the
synchronization point such that it's *possible* (not guaranteed) for units
of client_service to respond to server_service/0's destruction before it
materially affects them.

This guarantee also holds in the opposite direction: that is, when
client_service/0 is destroyed, units of server_service will not learn of
its departure (and hence will not revoke access) until after the unit has
finished tearing down the relation. I'm proposing that we *not* alter this
behaviour.

> With my interpretation of the proposed changes:
>
> If we 'remove-unit server_service/0':
>   1) server_service/0 is flagged as dying and relation-list stops listing
> it.
>   2) If there are any relname-relation-joined client_service hooks for
> server_service/0 pending, they are thrown away.
>   3) Wait until any relname-relation-joined hooks for server_service/0
> currently in progress are completed.
>   4) relname-relation-departed is run on all the client_service units.
> The dying server_service/0 doesn't yet know it is dying, so
> client_service can continue to to use it while the departed hooks are
> being run.
>   5) Wait until all the relname-relation-departed hooks have completed
>   6) Fire relname-relation-broken on server_service/0
>   7) Once all *-relation-broken hooks have completed, stop and kill
> server_service/0
>
> However, if instead we 'remove-unit client_service/0':
>   1) If there are any relname-relation-joined server_service hooks for
> client_service/0 pending, they are thrown away.
>   2) Wait until any relname-relation-joined hooks for client_service/0
> currently in progress are completed.
>   3) Fire relname-relation-broken on client_service/0.
>   4) Wait until all *-relation-broken hooks have completed on
> client_service. At this point, it is no longer in use.
>   5) Fire all the relname-relation-departed hooks for server_service/0
> and stop and kill client_service/0
>
> Is this correct? As far as I can see, existing charms are no worse off
> with this model but there is good chance client/server relations will
> benefit; clients are shutdown before the servers cut off their access,
> and servers are shutdown after clients stop making use of them.
>

It's not exactly correct -- I'll describe the details below -- but it's not
wrong enough to invalidate your conclusions... *except* that there's an
unexamined assumption that requirer/provider maps to client/server. I'm
most interested in discovering which relations (if any) *don't* follow the
model you assume, so that we don't break them thoughtlessly.

First, client departure continues to work exactly as it does today:

1) `juju destroy-unit client_service/0`.
2) client_service/0 observes that it's now dying, and:
2a) stops paying attention to the true state of server_service/*
2b) runs relname-relation-departed for each joined unit of server_service
2c) runs relname-relation-broken.
2d) informs the rest of the system that it's no longer participating in
relname.
3) server_service/* concurrently run relname-relation-departed for
client_service/0

...while server departure works *almost* the same as it does today:

1) `juju destroy-unit server_service/0`
2) server_service/0  observes that it's now dying, and:
2a) informs the rest of the system that it's no longer participating in
relname
2b) stops paying attention to the true state of client_service/*
2c) runs relname-relation-departed for each joined unit of client_service
2d) runs relname-relation-broken.
3) client_service/* concurrently run relname-relation-departed for
server_service/0

...so the difference is just that step (3) can start earlier than it would
today (immediately after (2a), rather than after (2d)).

It's important to realise that the *guarantees* made by the system do not
in fact become any stronger under the proposed model. If a unit of
client_service is (say) running a slow config-changed hook when (2a) comes
to pass, server_service/0 will *not* wait for that unit to handle depart
before cutting off access. It *would* in fact be possible to do this; but
the tradeoff in play there is whether we want an unresponsive or missing
unit of client_service to be capable of blocking the shutdown of
server_service/0. I'm +1 in theory, but nervous in practice; without
implementing `destroy-unit --force`, which is not entirely trivial (largely
but not entirely because it's blocked on "" [0]), that change could lead to
deadlocked environments. If you'd like the system to make this guarantee as
well, please let me know: I can't promise anything about scheduling
decisions there, but it will be useful input into those decisions ;).

For now, we need to figure out whether the initial proposal makes sense as
a step towards a happier system for everyone, or whether it represents a
breakage we can't afford.

Cheers
William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20130823/04744cdb/attachment.html>