Automatic retries of hooks

Wed Jan 20 07:46:31 UTC 2016

On 20 January 2016 at 13:17, John Meinel <john at arbash-meinel.com> wrote:

> There are classes of failures that a charm hook itself cannot handle. The
> specific one Bogdan was working with is the fact that the machine itself is
> getting restarted while the charm is in the middle of processing a hook.
> There isn't any way the hook itself can handle that, unless you could raise
> a very specific error that indicates you should be retried (so as it notices
> its about to die, it raises the try-me-again error).
>
> Hooks are supposed to be idempotent regardless, aren't they? So while we
> paper over transient bugs in them, doesn't it make the system more resilient
> overall?

The new update-status hook could be used to recover, as it is called
automatically at regular intervals. If the reboot really was random,
you would need to clear the error status first. But if it is triggered
by the charm, it is just a case of 'reboot(now+30s);
status_set('waiting', 'Waiting for reboot'); sys.exit(0)' and waiting
for the update-status hook to kick in.

It happens naturally if you structure your charm to have a single hook
that does everything that needs to be done, rather than trying to
craft individual hooks to deal with specific events.

-- 
Stuart Bishop <stuart.bishop at canonical.com>