Help needed to debug problem with the maas provider

William Reade william.reade at canonical.com
Fri Mar 16 11:17:16 UTC 2012


On Fri, 2012-03-16 at 16:25 +1000, Julian Edwards wrote:
> I've got problems when running a "juju deploy". The client end reports
> that it has succeeded but the provisioning agent on the master node
> bails out when it tries to start another node as below.

(er, psychic debugging ahoy... but this is my best guess)

The critical line is:

juju/providers/maas/launch.py:46

You're effectively setting the instance id to resource_uri; you want to
be setting it to system_id instead. If system_id is not available in
instance_data, you'll need to either add it in maas (please), or figure
out some horrible bash snippet to pass to set_instance_id_accessor that
will allow the machine to figure it out itself an runtime (please don't,
unless you have to: it makes my brain hurt ;)).

This is because, when you bootstrap, `juju-admin initialize` pokes a bit
of state into ZK to say "it's cool, I'm already provisioned, my
instance_id is [whatever arrived]" (which sometimes needs (needed?) to
be an executable bash snippet). However, in this case, the machine's
apparent instance_id can't be found in the list of running machines
(because it's not an instance_id/system_id at all); and so the PA is
imagining the machine to have gone down, and is trying to start a new
one, totally oblivious of the fact that the machine it's trying to
replace is the one it's running on.

(I'm pretty sure _validate is failing because it's trying to start a
machine with an id of int(0) rather than str(0), and there's a
long-standing and annoying confusion about which data type we should
actually be using. So, that's really a pair of complementary bugs in
CloudInit._validate and Bootstrap._launch_machine, but they're actually
pretty beneficial: they prevented the PA from self-replicating [0] and
thereby DDOSing your maas environment.)

HTH, ping me if you need me.

Cheers
William

[0] It doesn't even understand that machine 0 should be a provisioning
node, so at least if wouldn't actually start any new PAs, and the rate
of DDOSing would be linear rather than exponential... but that feels
like scant comfort really ;).




More information about the Juju mailing list