Updating state on agent upgrade

Thu Sep 26 02:46:25 UTC 2013

On Thu, Sep 26, 2013 at 3:20 AM, William Reade
<william.reade at canonical.com>wrote:

> I think that conceptually, "capability" makes sense for some things more
>>> than job/role. In particular, "has the ability to manage firewalls" seems
>>> better expressed as a capability than as a job. However, I don't think it's
>>> really worthwhile changing code to match. A capability can be expressed as
>>> a job, even if it's *slightly* awkward. The fact that we're giving a
>>> machine-agent the job "ManageFirewall" implies that it has that capability.
>>>
>>
> It's not so much how we express the fact as *where* we express it. If the
> only way we store important *environment* properties is by tacking them
> onto a (particular, special) *machine* I think we're setting ourselves up
> for trouble.
>

I think I misunderstood you before. Are you advocating something like this?
 - When bootstrapping, store all the environment's capabilities into the
environ doc in state.
 - AddMachine/InjectMachine is called with a role (say, StateServer, or
MachineAgent; is it just a boolean flag?), rather than jobs.
 - The state package will load capabilities from mongo, and translate role
& capabilities to machine-specific jobs.

>  Yep. As discussed on IRC, this could just as well be done with the
>>> key/value map. I kind of don't like adding required things into a key/value
>>> map, but on on the other hand this is bootstrap-specific, and not something
>>> the machine-agent proper cares about. Not changing the format is good, too.
>>>
>>
> I don't like the key/value map either tbh, but I think it' the least worst
> way of getting the data where we need it. I just worry that lots of other
> things may be able to get at that data too and may start to abuse it -- I
> honestly feel that all the k/v data is borderline abusive anyway, and I'd
> prefer not to entrench its use unless we really can't figure out a better
> solution. I *do* feel that k/v would be great if it were used exclusively
> to feed bootstrap-state, and was elided from the machine agent config --
> because, really, the source of truth for pretty much *everything* should be
> the API.
>

Yes, sorry if I wasn't clear about that; I was only suggesting that the
jobs be added to agent.conf for bootstrapping. Apart from that, jobs will
come through the API as usual.
If I'm reading your suggestions correctly, though, it sounds like jobs
won't go through the bootstrap agent.conf at all. but environment
capabilities would be.

>  We don't need to add the capabilities to the config.  We could add them
>>>> to the information that the machine api gets back.  However, since the
>>>> machine agents don't know what the environment is, it takes us back to
>>>> storing the roles (jobs) in state.
>>>
>>>
> Agreed; and I remain pretty much steadfast that state is where we *should*
> be storing the capability information.
>

>  We need a state side, server upgrade process defined.  Enough of this
>>>> ad-hoc jiggery-pokery.
>>>>
>>>
> The jiggery-pokery seems to me to be inescapable while clients and manager
> nodes are still making uncontrolled DB connections. It's evil and it sucks
> and everyone hates it, but the alternatives seem no better (and rather more
> prone to possible corruption).
>
>
>>  We also need a defined process for upgrades.  I'm not sure how close we
>>>> are to this right now, but I think we need something like this:
>>>>
>>>> 1) Put the API server into a state where it continues to serve requests,
>>>> but doesn't accept new connections.
>>>> 2) The tool version is updated causing all machine agents to kill
>>>> themselves.
>>>> 3) We need some form of state-side lock to allow only one state server
>>>> to modify the underlying structure, and a defined process of functions
>>>> to run to modify the state documents to the next version. [1]
>>>>
>>>> This process needs to be defined, and stable, such that we don't delete
>>>> it all when the next minor branch commit is done.
>>>>
>>>> 4) When the state servers have been upgraded, we then kick off the api
>>>> servers, which the machine agents can then connect to.
>>>>
>>>
>>> This sounds sane to me.
>>>
>>
> Yeah: that sounds like a very sane starting point. Do you see a way to get
> this to work started usefully without isolating state completely?
>

Possibly not :)
Backwards-incompatible schema upgrades don't necessarily have to be handled
straight away; if we get the rest of it in place, then state-lockout can be
guaranteed when everything is behind the API. It might be feasible to do
all the above list, but in (3) just make the jobs changes in machine docs.
That should be doable without colliding with other DB connections. Even if
we went the "jiggery-pokery" route, the jobs need to be updated. One
problem I see is that someone doing "juju add-machine" with older tools
could still put a machine into state with an older/incomplete set of jobs
for the new world. I'll have to think about it some more.

 Yep. After I sent the email yesterday, I began thinking that this upgrade
>>> functionality is going to be exactly what's needed for updating the state
>>> schema. I've got a few things to finish off (authenticated httpstorage is
>>> half down; still need to document manual provisioning). Pending
>>> cloud-installer work, I can start looking into this in a bit more detail.
>>>
>>
> That sounds good to me if we've got a path forward.
>
>
>>  Vague ideas at the moment:
>>> - Add a version to the state database (I suppose there'd need to be some
>>> kind of metadata document collection), to track required schema changes.
>>> - Add a state/upgrade package, which keeps a full history of
>>> point-to-point schema updates required. We iterate through version changes,
>>> applying upgrade steps one at a time. Everything must be done in a
>>> transaction, naturally.
>>>
>>
> FWIW, everything will I think have to be done in a long series of
> transactions, especially if we're considering arbitrary changes. Shouldn't
> be a big deal so long as the various changes are idempotent, I think.
>
>
>>  - One API server will (with a global lock):
>>>    * First upgrade the state database. All other code can be written to
>>> assume the current database schema.
>>>    * Invoke an EnvironUpgrader interface method, optionally implemented
>>> by an Environ. This interface defines a method for upgrading some
>>> provider-specific aspects of the environment (e.g. going through and adding
>>> jobs to all of the state-server machines). The EnvironUpgrader will
>>> similarly need to keep track of versions, and point-to-point upgrades.
>>>
>>
> Strong -1 on getting the environ involved directly in this case. Juju
> should itself be able to discover the capabilities of the environment, and
> react to them, rather than having the environment control the contents of
> state. The idea of an EnvironUpgrader *is* itself worthwhile, but it should
> be used to fix things that are part of the environment, rather than part of
> state, and I don't think the need is so pressing right now.
>

How is a capability discovered? I would think at bootstrap time an
environment would need to register its capabilities into state. So then, as
new capabilities are made available (e.g. "ability to manage firewall"),
how does an environment register them on upgrade?

>  [1] I think the stance of only supporting upgrades to the next public
>>>> release is crackful.  Consider a user that has a working juju install
>>>> and has not needed to poke it for ages.  They are on 2.2.0.  They read
>>>> the latest juju docs that show new amazing juju-gui stuff that requires
>>>> 2.10.0.  We should not make them go 2.2 -> 2.4 -> 2.6 -> 2.8 -> 2.10 as
>>>> that is clearly a terrible end user experience.
>>>>
>>>
>>> From a user POV, that sounds pretty horrible.
>>>
>>
> I strongly agree that when we're in a position to manage upgrades sanely,
> we should by default apply all upgrades until we reach the latest version.
> The requirement to advance in .2s is only intended to last as long as the
> exposed state problem.
>
> Cheers
> William
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20130926/e5c95887/attachment.html>