Re: Multi install with existing MAAS starts all services except for “IP Pending” on Glance Simplestreams Image Sync

Fri Jul 31 19:19:54 UTC 2015

So I failed at English there a bit… for the sake of the list if others
are trying to parse it:

The juju deployment host must be *trusty*, otherwise juju will attempt
to deploy the glance sync charm based on vivid, which wreaks havoc,
perhaps due to a systemd issue.

Jeff

On Fri, Jul 31, 2015 at 3:15 PM, Jeff McLamb <mclamb at gmail.com> wrote:
> Success!
>
> Sorry it took a while to get back, but just wanted to follow up and
> say I finally have a start-to-finish working Multi install! The TL;DR
> of it is that I need to have a trust-based juju deployment host
> running the latest openstack-installer from the experimental ppa. The
> trust requirement is due to the attempts to install the glance sync
> charm with from vivid if on vivid, and the experimental ppa is
> required because the stable branch does not seem to honor http-proxy
> and https-proxy command line arguments. It is also necessary that the
> juju deployment host be on the UTC timezone in order to match the
> machines deployed by juju/MAAS.
>
> The last few failed iterations were due to a physical machine failing
> to deploy, say the compute node. This issue was on my end as sometimes
> my physical servers do not boot without manual interaction due to some
> bizarre PSU voltage too low warning. If the compute node does not come
> up within a reasonable amount of time, it seems some of the scripts
> get run improperly, hence the issues with keystone users, roles, etc.
> not being available.
>
> This last time I manually made sure all servers came up and intervened
> if they tried to block.
>
> The end result is I now have a seemingly working copy of Juno (I used
> —openstack-release juno) and all interactions on the horizon dashboard
> seem to be good! I will keep messing around with it and try to deploy
> some VMs when I get a chance. I will also likely try a re-deploy of
> Kilo and see how that works.
>
> Thanks so much for the help, Mike and Adam, I really appreciate it.
> Hopefully we can feed some of this stuff back into the process to make
> it easier. I’ll follow up with more… I think the main page on
> ubuntu.com for deploying the Canonical Distribution needs some
> updating. For example, it fails to say you must install juju before
> openstack ;)
>
> Thank you than you!
>
> Jeff
>
>
> On Thu, Jul 30, 2015 at 5:01 PM, Mike McCracken
> <mike.mccracken at canonical.com> wrote:
>> It definitely looks like the initial failed compute node deployment caused
>> some problems.
>>
>> It looks like the script was being run repeatedly and failing on the
>> following command:
>>
>> keystone user-role-add --user ubuntu --role Member --tenant ubuntu
>>
>>>> No role with a name or ID of 'Member' exists.
>>
>> which is the same thing that happened when you tried it again just now.
>>
>> Then you apparently killed the install and tried again, at which point the
>> log is flooded with errors relating to it not finding the machine ID that it
>> recorded in the placement. It's pretty clear that it doesn't deal well with
>> machines where you placed a service leaving MAAS afterward.
>>
>> The setup script doesn't run again because after restarting, the
>> nova-cloud-controller service is marked as having been deployed, even though
>> the script never actually completed successfully.
>>
>> Off the top of my head I don't know what might be going on with keystone, I
>> thought the Member role was created by default.
>> Maybe the keystone unit's debug log has a clue, but at this point I'd be
>> tempted to just try again and avoid the broken machine.
>>
>> I'm sorry this has been such an ordeal, thanks for testing things out!
>> -mike
>>
>> On Thu, Jul 30, 2015 at 12:02 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>>>
>>> Here is commands.log, which definitely has complaints about
>>> nova-controller-setup.sh:
>>>
>>> http://paste.ubuntu.com/11968600/
>>>
>>> And after running nova-controller-setup.sh again via juju as you
>>> mentioned:
>>>
>>> http://paste.ubuntu.com/11968544/
>>>
>>>
>>> So I guess because the compute node failed to deploy in the first
>>> place, the installer still tried to issue the nova-controller-setup.sh
>>> script but it failed without a compute node? Or is that not involved
>>> in that process? And then, when I re-commissioned and deployed the
>>> compute node, it failed to re-run the script?
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>>
>>> On Thu, Jul 30, 2015 at 1:41 PM, Mike McCracken
>>> <mike.mccracken at canonical.com> wrote:
>>> > Hi Jeff, the ubuntu user and roles etc are created by a script that the
>>> > installer runs after deploying nova-cloud-controller.
>>> > The file ~/.cloud-install/commands.log will have any errors encountered
>>> > while trying to run that script.
>>> > You can also look at the script that would run in
>>> > ~/.cloud-install/nova-controller-setup.sh, and optionally try running it
>>> > yourself - it should be present on the nova-cloud-controller unit in
>>> > /tmp so
>>> > you can do e.g.
>>> > % juju run --unit nova-cloud-controller/0 "/tmp/nova-controller-setup.sh
>>> > <the password you used in the installer> Single"
>>> > to try it again.
>>> >
>>> > On Thu, Jul 30, 2015 at 10:14 AM, Jeff McLamb <mclamb at gmail.com> wrote:
>>> >>
>>> >> So it was easy enough to pick up from the single failed node. After
>>> >> Deleting it, re-enlisting, commissioning, etc. I was presented with a
>>> >> Ready node with a new name, etc.
>>> >>
>>> >> I went into openstack-status and simply added the Compute service that
>>> >> was missing and deployed it to this new node. After a while it was up,
>>> >> all services looked good.
>>> >>
>>> >> I issued a `juju machine remove 1` to remove the pending failed
>>> >> machine from juju that was no longer in the MAAS database — it had
>>> >> nothing running on it  obviously, so I figured it would be best to
>>> >> remove it from juju. The new machine is machine 4.
>>> >>
>>> >> Now when I try to login to horizon, I get "An error occurred
>>> >> authenticating. Please try again later.”
>>> >>
>>> >> The keystone logs suggest user ubuntu and various roles and projects
>>> >> were not created, even though openstack-installer tells me to login to
>>> >> horizon with user ubuntu and the password I gave it.
>>> >>
>>> >> Here are the keystone logs:
>>> >>
>>> >> http://paste.ubuntu.com/11967913/
>>> >>
>>> >> Here are the apache error logs on the openstack-dashboard container:
>>> >>
>>> >> http://paste.ubuntu.com/11967922/
>>> >>
>>> >> Any ideas here?
>>> >>
>>> >>
>>> >> On Thu, Jul 30, 2015 at 12:34 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>>> >> > Just to give you an update where I am:
>>> >> >
>>> >> > I tried various forms still using the underlying vivid MAAS/juju
>>> >> > deployment host, tried --edit-placement, which erred out, tried
>>> >> > removing Glance Sync again, etc. all to no avail.
>>> >> >
>>> >> > Then I created a trusty VM on the MAAS host and installed the stable
>>> >> > juju and cloud-install ppa's. The problem with the stable version of
>>> >> > openstack-install is that it does not honor the http_proxy and
>>> >> > https_proxy lines passed on the command-line. I can see that they do
>>> >> > not get put into the environments.yaml file, so I ended up with the
>>> >> > same issue there as I had originally, where it could not download the
>>> >> > tools.
>>> >> >
>>> >> > So I updated the cloud-install to the experimental on the trusty juju
>>> >> > deployment VM and used the latest version, which worked fine with
>>> >> > http_proxy and https_proxy. I have played around with trying to
>>> >> > deploy
>>> >> > both juno and kilo as well.
>>> >> >
>>> >> > My latest attempt on trusty, deploying juno, has left one physical
>>> >> > node in a Failed Deployment state, which seems to have been caused
>>> >> > because it keeps saying the BMC is busy, so it can't control power. I
>>> >> > tried releasing it, which failed, so I ultimately had to Delete it
>>> >> > and
>>> >> > re-enlist, re-commission.
>>> >> >
>>> >> > Now I am at a point where the machine is back to Ready and the
>>> >> > openstack-install is still waiting on 1 last machine (the other 2
>>> >> > deployed just fine)... When something like this happens, is it
>>> >> > possible to re-deploy the last remaining host, or must I start over
>>> >> > deploying all machines again?
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Jeff
>>> >> >
>>> >> >
>>> >> > On Thu, Jul 30, 2015 at 1:21 AM, Mike McCracken
>>> >> > <mike.mccracken at canonical.com> wrote:
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Jul 29, 2015 at 5:30 PM, Jeff McLamb <mclamb at gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> OK a quick look at the neutron-api/0 /var/log/neutron just shows
>>> >> >>> the
>>> >> >>> neutron-server.log as before… but since I stepped away in the past
>>> >> >>> hour it’s now at 800MB and counting! ;)
>>> >> >>>
>>> >> >>> I will play around with the relations a bit just to learn what’s
>>> >> >>> going
>>> >> >>> on, but then I will take your advice and try various alternatives
>>> >> >>> with
>>> >> >>> —edit-placement first, then finally just changing the underlying
>>> >> >>> MAAS
>>> >> >>> deployment server to trusty and see where it takes me.
>>> >> >>
>>> >> >>
>>> >> >> Sounds good
>>> >> >>
>>> >> >>>
>>> >> >>> Could also try
>>> >> >>> to install without —upstream-ppa which I imagine will install juno
>>> >> >>> instead of kilo?
>>> >> >>
>>> >> >>
>>> >> >> oh, --upstream-ppa doesn't do anything for the MAAS install path,
>>> >> >> it's
>>> >> >> only
>>> >> >> applicable to the containerized single install.
>>> >> >> It's harmless, though. On the single install, it's used to specify
>>> >> >> that
>>> >> >> version of the "openstack" package (which contains
>>> >> >> openstack-install)
>>> >> >> that
>>> >> >> will be installed on the container to run the second half of the
>>> >> >> process
>>> >> >> should come from our experimental PPA. It could use some better
>>> >> >> docs/usage
>>> >> >> string.
>>> >> >>
>>> >> >> If you're interested in trying out other openstack release versions,
>>> >> >> you
>>> >> >> want to look at --openstack-release.
>>> >> >>
>>> >> >> -mike
>>> >> >>
>>> >> >>>
>>> >> >>> Will keep you posted and continued thanks for all the help.
>>> >> >>>
>>> >> >>> Jeff
>>> >> >>>
>>> >> >>> On Wed, Jul 29, 2015 at 7:08 PM, Mike McCracken
>>> >> >>> <mike.mccracken at canonical.com> wrote:
>>> >> >>> > Jeff, based on the other logs you sent me, e.g.
>>> >> >>> > neutron-metadata-agent.log,
>>> >> >>> > it was pointed out to me that it's trying to connect to rabbitMQ
>>> >> >>> > on
>>> >> >>> > localhost, which is wrong.
>>> >> >>> > So something is failing to complete the juju relations.
>>> >> >>> > My hypothesis is that the failing vivid-series charm is messing
>>> >> >>> > up
>>> >> >>> > juju's
>>> >> >>> > relations.
>>> >> >>> > If you want to dig further, you can start looking at the
>>> >> >>> > relations
>>> >> >>> > using
>>> >> >>> > e.g. 'juju run --unit 'relation-get amqp:rabbitmq' ' (might just
>>> >> >>> > be
>>> >> >>> > 'amqp')
>>> >> >>> >
>>> >> >>> > Or if you'd like to try just redeploying without the sync charm
>>> >> >>> > using
>>> >> >>> > --edit-placement, that might get a healthy cluster going, just
>>> >> >>> > one
>>> >> >>> > without
>>> >> >>> > glance images.
>>> >> >>> > Then you could pretty easily deploy the charm manually, or just
>>> >> >>> > do
>>> >> >>> > without
>>> >> >>> > it and upload images you get from cloud-images.ubuntu.com
>>> >> >>> > manually .
>>> >> >>> >
>>> >> >>> > Sorry this is not as simple as it should be, yet :)
>>> >> >>> > -mike
>>> >> >>> >
>>> >> >>> > On Wed, Jul 29, 2015 at 4:00 PM, Mike McCracken
>>> >> >>> > <mike.mccracken at canonical.com> wrote:
>>> >> >>> >>
>>> >> >>> >> ok, so I just learned that the neutron-manage log should be in
>>> >> >>> >> the
>>> >> >>> >> neutron-api unit, so can you 'juju ssh neutron-api/0' and look
>>> >> >>> >> in
>>> >> >>> >> /var/log/neutron there?
>>> >> >>> >>
>>> >> >>> >> On Wed, Jul 29, 2015 at 3:34 PM, Jeff McLamb <mclamb at gmail.com>
>>> >> >>> >> wrote:
>>> >> >>> >>>
>>> >> >>> >>> The neutron-server.log that is 500MB+ and growing is nonstop
>>> >> >>> >>> repeated
>>> >> >>> >>> output of the following, due to a database table that does not
>>> >> >>> >>> exist:
>>> >> >>> >>>
>>> >> >>> >>> http://paste.ubuntu.com/11962679/
>>> >> >>> >>>
>>> >> >>> >>> On Wed, Jul 29, 2015 at 6:30 PM, Jeff McLamb <mclamb at gmail.com>
>>> >> >>> >>> wrote:
>>> >> >>> >>> > Hey Mike -
>>> >> >>> >>> >
>>> >> >>> >>> > OK so here is the juju status output. The quantum-gateway
>>> >> >>> >>> > doesn’t
>>> >> >>> >>> > look
>>> >> >>> >>> > too strange, but I am new. The exposed status is false, but
>>> >> >>> >>> > so
>>> >> >>> >>> > it is
>>> >> >>> >>> > for all services, and I can definitely access, say, the
>>> >> >>> >>> > dashboard,
>>> >> >>> >>> > even though it is not “exposed”. One thing of note is the
>>> >> >>> >>> > public-address lines that sometimes use the domain names,
>>> >> >>> >>> > e.g.
>>> >> >>> >>> > downright-feet.maas in this case, whereas some services use
>>> >> >>> >>> > IP
>>> >> >>> >>> > addresses. I have noticed that I cannot resolve the maas
>>> >> >>> >>> > names
>>> >> >>> >>> > from
>>> >> >>> >>> > the MAAS server (because I use the ISP’s DNS servers) but I
>>> >> >>> >>> > can
>>> >> >>> >>> > resolve them from the deployed nodes.  Here is the output:
>>> >> >>> >>> >
>>> >> >>> >>> > http://paste.ubuntu.com/11962631/
>>> >> >>> >>> >
>>> >> >>> >>> > Here is the quantum gateway replay:
>>> >> >>> >>> >
>>> >> >>> >>> > http://paste.ubuntu.com/11962644/
>>> >> >>> >>> >
>>> >> >>> >>> > Where are the neutron-manage logs? I see lots of neutron
>>> >> >>> >>> > stuff
>>> >> >>> >>> > on
>>> >> >>> >>> > various containers and nodes — the neutron-server.log is what
>>> >> >>> >>> > I
>>> >> >>> >>> > pasted
>>> >> >>> >>> > before and it is 500+MB and growing across a few nodes, but I
>>> >> >>> >>> > can’t
>>> >> >>> >>> > seem to fine neutron-manage.
>>> >> >>> >>> >
>>> >> >>> >>> > Thanks!
>>> >> >>> >>> >
>>> >> >>> >>> > Jeff
>>> >> >>> >>> >
>>> >> >>> >>> >
>>> >> >>> >>> > On Wed, Jul 29, 2015 at 5:26 PM, Mike McCracken
>>> >> >>> >>> > <mike.mccracken at canonical.com> wrote:
>>> >> >>> >>> >> Hi Jeff, I asked internally and was asked if you could share
>>> >> >>> >>> >> the
>>> >> >>> >>> >> juju
>>> >> >>> >>> >> charm
>>> >> >>> >>> >> logs from quantum-gateway and the neutron-manage logs in
>>> >> >>> >>> >> /var/log/neutron.
>>> >> >>> >>> >>
>>> >> >>> >>> >> the charm log can be replayed by using 'juju debug-log -i
>>> >> >>> >>> >> quantum-gateway/0
>>> >> >>> >>> >> --replay'
>>> >> >>> >>> >>
>>> >> >>> >>> >> On Wed, Jul 29, 2015 at 2:03 PM, Mike McCracken
>>> >> >>> >>> >> <mike.mccracken at canonical.com> wrote:
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> Sorry this is so frustrating.
>>> >> >>> >>> >>> Can you check 'juju status' for this environment and see if
>>> >> >>> >>> >>> it
>>> >> >>> >>> >>> says
>>> >> >>> >>> >>> anything useful about the quantum-gateway service (aka
>>> >> >>> >>> >>> neutron,
>>> >> >>> >>> >>> the
>>> >> >>> >>> >>> juju
>>> >> >>> >>> >>> service name will be updated soon).
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> -mike
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> On Wed, Jul 29, 2015 at 1:15 PM, Jeff McLamb
>>> >> >>> >>> >>> <mclamb at gmail.com>
>>> >> >>> >>> >>> wrote:
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> OK, making progress now. Per your recommendation I removed
>>> >> >>> >>> >>>> and
>>> >> >>> >>> >>>> added
>>> >> >>> >>> >>>> back in the trusty sync charm manually.
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Now, I can log in to the horizon dashboard!
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> However, several tabs result in a generic OpenStack (not
>>> >> >>> >>> >>>> Ubuntu-customized like the general dashboard pages)
>>> >> >>> >>> >>>> "Something
>>> >> >>> >>> >>>> went
>>> >> >>> >>> >>>> wrong! An unexpected error has occurred. Try refreshing
>>> >> >>> >>> >>>> the
>>> >> >>> >>> >>>> page..."
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> The tabs in question that give those results are Compute
>>> >> >>> >>> >>>> ->
>>> >> >>> >>> >>>> Access &
>>> >> >>> >>> >>>> Security, Network -> Network Topology,
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> When I go to pages like Network -> Routers, it does
>>> >> >>> >>> >>>> render,
>>> >> >>> >>> >>>> but
>>> >> >>> >>> >>>> there
>>> >> >>> >>> >>>> are error popup boxes in the page itself with:
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Error: Unable to retrieve router list.
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> and
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Error: Unable to retrieve a list of external networks
>>> >> >>> >>> >>>> "Connection
>>> >> >>> >>> >>>> to
>>> >> >>> >>> >>>> neutron failed: HTTPConnectionPool(host='192.168.1.45',
>>> >> >>> >>> >>>> port=9696):
>>> >> >>> >>> >>>> Max retries exceeded with url:
>>> >> >>> >>> >>>> /v2.0/networks.json?router%3Aexternal=True (Caused by
>>> >> >>> >>> >>>> <class
>>> >> >>> >>> >>>> 'httplib.BadStatusLine'>: '')”.
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> If I do a `juju ssh openstack-dashboard/0` and tail -f
>>> >> >>> >>> >>>> /var/log/apache2/error.log I get the following when
>>> >> >>> >>> >>>> accessing
>>> >> >>> >>> >>>> one
>>> >> >>> >>> >>>> of
>>> >> >>> >>> >>>> the failed pages:
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> http://paste.ubuntu.com/11961863/
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Furthermore, looking at the neutron server logs, I see
>>> >> >>> >>> >>>> non-stop
>>> >> >>> >>> >>>> traces
>>> >> >>> >>> >>>> about the neutron.ml2_gre_allocations table not existing:
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> http://paste.ubuntu.com/11961891/
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Getting closer, bit by bit.
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Thanks for all the help,
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Jeff
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >
>>> >> >>
>>> >> >>
>>> >
>>> >
>>
>>