[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg
Andres Rodriguez
andreserl at ubuntu-pe.org
Mon Feb 5 21:27:15 UTC 2018
@Steve,
MAAS already has a mechanism to collapse retries into the initial request.
In this case, it is the rack that grabs the requests and makes a request to
the region. If retries come within the time that the rack is waiting for a
response from the region, these request get "ignored" and the Rack will
only answer the first request. This is what the logs show after testing
with fixed grub, where grub makes multiple requests and MAAS answers
seconds after does requests, but only answers once. This is because the
requests were collapsed on the maas side.
If, however, the retries come in after the region has answered the rack,
they these requests will be served.
On Mon, Feb 5, 2018 at 2:34 PM, Steve Langasek <steve.langasek at canonical.com
> wrote:
> Jason's feedback was that, after making the changes to the storage
> configuration of his environment, deploying the test grubx64.efi doesn't
> have any effect on the MAAS server's response time to tftp requests. So
> at this point it's not at all clear that the grub change, while correct,
> helps with this high-level symptom.
>
> It has also been suggested that each udp retry is generating a separate
> database query from MAAS. That is absolutely a MAAS bug if true, and
> not something that can or should be fixed in GRUB.
>
> ** Changed in: grub2 (Ubuntu)
> Importance: Critical => Medium
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
> Failed Deployment after timeout trying to retrieve grub cfg
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; milestone=2.4.x; status=Incomplete;
> importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
> status=In Progress; importance=Medium; assignee=mathieu.tl at gmail.com;
> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan jason-hobbs vorlon
> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs)
> Launchpad-Bug-Modifier: Steve Langasek (vorlon)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1743249
Title:
Failed Deployment after timeout trying to retrieve grub cfg
Status in MAAS:
New
Status in grub2 package in Ubuntu:
In Progress
Bug description:
A node failed to deploy after it failed to retrieve a grub.cfg from
MAAS due to a timeout. In the logs, it's clear that the server tried
to retrieve the grub cfg many times, over about 30 seconds:
http://paste.ubuntu.com/26387256/
We see the same thing for other hosts around the same time:
http://paste.ubuntu.com/26387262/
It seems like MAAS is taking way too long to respond to these
requests.
This is very similar to bug 1724677, which was happening pre-
metldown/spectre. The only difference is we don't see "[critical] TFTP
back-end failed" in the logs anymore.
I connected to the console on this system and it had errors about
timing out retrieving the grub-cfg, then it had an error message along
the lines of "error not an ip" and then "double free". After I
connected but before I could get a screenshot the system rebooted and
was directed by maas to power off, which it did successfully after
booting to linux.
Full logs are available here:
https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
ed277a020e7c/cpe_cloud_395/infra-logs.tar
This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
More information about the foundations-bugs
mailing list