High Availability command line interface - future plans.

Fri Nov 8 11:54:56 UTC 2013

On 8 November 2013 10:31, John Arbash Meinel <john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2013-11-08 14:15, roger peppe wrote:
>> On 8 November 2013 08:47, Mark Canonical Ramm-Christensen
>> <mark.ramm-christensen at canonical.com> wrote:
>>> I have a few high level thoughts on all of this, but the key
>>> thing I want to say is that we need to get a meeting setup next
>>> week for the solution to get hammered out.
>>>
>>> First, conceptually, I don't believe the user model needs to
>>> match the implementation model.  That way lies madness -- users
>>> care about the things they care about and should not have to
>>> understand how the system works to get something basic done.
>>> See:
>>> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140
>>> for reasons why I call this madness.
>>>
>>> For that reason I think the path of adding a --jobs flag to
>>> add-machine is not a move forward.  It is exposing implementation
>>> detail to users and forcing them into a more complex conceptual
>>> model.
>>>
>>> Second, we don't have to boil the ocean all at once. An
>>> "ensure-ha" command that sets up additional server nodes is
>>> better than what we have now -- nothing.  Nate is right, the box
>>> need not be black, we could have an juju ha-status command that
>>> just shows the state of HA.   This is fundamentally different
>>> than changing the behavior and meaning of add-machines to know
>>> about juju jobs and agents and forcing folks to think about
>>> that.
>>>
>>> Third, we I think it is possible to chart a course from ensure-ha
>>> as a shortcut (implemented first) to the type of syntax and
>>> feature set that Kapil is talking about.  And let's not kid
>>> ourselves, there are a bunch of new features in that proposal:
>>>
>>> * Namespaces for services * support for subordinates to state
>>> services * logging changes * lifecycle events on juju "jobs" *
>>> special casing the removal of services that would kill the
>>> environment * special casing the stats to know about HA and warn
>>> for even state server nodes
>>>
>>> I think we will be adding a new concept and some new syntax when
>>> we add HA to juju -- so the idea is just to make it easier for
>>> users to understand, and to allow a path forward to something
>>> like what Kapil suggests in the future.   And I'm pretty solidly
>>> convinced that there is an incremental path forward.
>>>
>>> Fourth, the spelling "ensure-ha" is probably not a very good
>>> idea, the cracks in that system (like taking a -n flag, and
>>> dealing with failed machines) are already apparent.
>>>
>>> I think something like Nick's proposal for "add-manager" would be
>>> better. Though I don't think that's quite right either.
>>>
>>> So, I propose we add one new idea for users -- a state-server.
>>>
>>> then you'd have:
>>>
>>> juju management --info juju management --add juju management
>>> --add --to 3 juju management --remove-from
>>
>> This seems like a reasonable approach in principle (it's
>> essentially isomorphic to the --jobs approach AFAICS which makes me
>> happy).
>>
>> I have to say that I'm not keen on using flags to switch the basic
>> behaviour of a command. The interaction between the flags can then
>> become non-obvious (for example a --constraints flag might be
>> appropriate with --add but not --remove-from).
>>
>> Ah, but your next message seems to go along with that.
>>
>> So, to couch your proposal in terms that are consistent with the
>> rest of the juju commands, here's how I see it could look, in terms
>> of possible help output from the commands:
>>
>> usage: juju add-management [options] purpose: Add Juju management
>> functionality to a machine, or start a new machine with management
>> functionality. Any Juju machine can potentially participate as a
>> Juju manager - this command adds a new such manager. Note that
>> there should always be an odd number of active management machines,
>> otherwise the Juju environment is potentially vulnerable to
>> network partitioning. If a management machine fails, a new one
>> should be started to replace it.
>
> I would probably avoid putting such an emphasis on "any machine can be
> a manager machine". But that is my personal opinion. (If you want HA
> you probably want it on dedicated nodes.)
>
>>
>> options: --constraints  (= ) additional machine constraints.
>> Ignored if --to is specified. -e, --environment (= "local") juju
>> environment to operate in --series (= "") the Ubuntu series of the
>> new machine. Ignored if --to is specified. --to (="") the id of the
>> machine to add management to. If this is not specified, a new
>> machine is provisioned.
>>
>> usage: juju remove-management [options] <machine-id> purpose:
>> Remove Juju management functionality from the machine with the
>> given id. The machine itself is not destroyed. Note that if there
>> are less than three management machines remaining, the operation of
>> the Juju environment will be vulnerable to the failure of a single
>> machine. It is not possible to remove the last management machine.
>>
>
> I would probably also remove the machine if the only thing on it was
> the management. Certainly that is how people want us to do "juju
> remove-unit".

That seems reasonable, though I think you'd still want some
way to ask for the machine to be kept around, given the costs
of provisioning in some places.

>> options: -e, --environment (= "local") juju environment to operate
>> in
>>
>> As a start, we could implement only the add-management command, and
>> not implement the --to flag. That would be sufficient for our HA
>> deliverable, I believe. The other features could be added in time
>> or according to customer demand.
>
> The main problem with this is that it feels slightly too easy to add
> just 1 machine and then not actually have HA (mongo stops allowing
> writes if you have a 2-node cluster and lose one, right?)

I think we could just print a warning in that case,
although we could error too, as Gustavo suggests.