[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

Jason Hobbs jason.hobbs at canonical.com
Mon Feb 5 21:45:27 UTC 2018


On Mon, Feb 5, 2018 at 3:27 PM, Andres Rodriguez
<andreserl at ubuntu-pe.org> wrote:
> @Steve,
>
> MAAS already has a mechanism to collapse retries into the initial request.
> In this case, it is the rack that grabs the requests and makes a request to
> the region. If retries come within the time that the rack is waiting for a
> response from the region, these request get "ignored" and the Rack will
> only answer the first request. This is what the logs show after testing
> with fixed grub, where grub makes multiple requests and MAAS answers
> seconds after does requests, but only answers once. This is because the
> requests were collapsed on the maas side.
>
> If, however, the retries come in after the region has answered the rack,
> they these requests will be served.

This is not true.  MAAS is responding to every single request grub
makes for the file - the tcpdump logs show it.   And these are not
"read 4 times" requests - they are retries because grub didn't get a
response.

This pcap shows MAAS responding to every request for grub.cfg-<mac>:
https://bugs.launchpad.net/maas/+bug/1743249/+attachment/5046952/+files/spearow-fall-back-to-default-amd64.pcap

Jason

>
> On Mon, Feb 5, 2018 at 2:34 PM, Steve Langasek <steve.langasek at canonical.com
>> wrote:
>
>> Jason's feedback was that, after making the changes to the storage
>> configuration of his environment, deploying the test grubx64.efi doesn't
>> have any effect on the MAAS server's response time to tftp requests.  So
>> at this point it's not at all clear that the grub change, while correct,
>> helps with this high-level symptom.
>>
>> It has also been suggested that each udp retry is generating a separate
>> database query from MAAS.  That is absolutely a MAAS bug if true, and
>> not something that can or should be fixed in GRUB.
>>
>> ** Changed in: grub2 (Ubuntu)
>>    Importance: Critical => Medium
>>
>> --
>> You received this bug notification because you are subscribed to MAAS.
>> https://bugs.launchpad.net/bugs/1743249
>>
>> Title:
>>   Failed Deployment after timeout trying to retrieve grub cfg
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
>>
>> Launchpad-Notification-Type: bug
>> Launchpad-Bug: product=maas; milestone=2.4.x; status=Incomplete;
>> importance=Undecided; assignee=None;
>> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
>> status=In Progress; importance=Medium; assignee=mathieu.tl at gmail.com;
>> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch
>> Launchpad-Bug-Information-Type: Public
>> Launchpad-Bug-Private: no
>> Launchpad-Bug-Security-Vulnerability: no
>> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan jason-hobbs vorlon
>> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs)
>> Launchpad-Bug-Modifier: Steve Langasek (vorlon)
>> Launchpad-Message-Rationale: Subscriber (MAAS)
>> Launchpad-Message-For: andreserl
>>
>
>
> --
> Andres Rodriguez (RoAkSoAx)
> Ubuntu Server Developer
> MSc. Telecom & Networking
> Systems Engineer
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
>   Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
>   New
> Status in grub2 package in Ubuntu:
>   In Progress
>
> Bug description:
>   A node failed to deploy after it failed to retrieve a grub.cfg from
>   MAAS due to a timeout.  In the logs, it's clear that the server tried
>   to retrieve the grub cfg many times, over about 30 seconds:
>
>   http://paste.ubuntu.com/26387256/
>
>   We see the same thing for other hosts around the same time:
>
>   http://paste.ubuntu.com/26387262/
>
>   It seems like MAAS is taking way too long to respond to these
>   requests.
>
>   This is very similar to bug 1724677, which was happening pre-
>   metldown/spectre. The only difference is we don't see "[critical] TFTP
>   back-end failed" in the logs anymore.
>
>   I connected to the console on this system and it had errors about
>   timing out retrieving the grub-cfg, then it had an error message along
>   the lines of "error not an ip" and then "double free".  After I
>   connected but before I could get a screenshot the system rebooted and
>   was directed by maas to power off, which it did successfully after
>   booting to linux.
>
>   Full logs are available here:
>   https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
>   ed277a020e7c/cpe_cloud_395/infra-logs.tar
>
>   This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1743249

Title:
  Failed Deployment after timeout trying to retrieve grub cfg

Status in MAAS:
  New
Status in grub2 package in Ubuntu:
  In Progress

Bug description:
  A node failed to deploy after it failed to retrieve a grub.cfg from
  MAAS due to a timeout.  In the logs, it's clear that the server tried
  to retrieve the grub cfg many times, over about 30 seconds:

  http://paste.ubuntu.com/26387256/

  We see the same thing for other hosts around the same time:

  http://paste.ubuntu.com/26387262/

  It seems like MAAS is taking way too long to respond to these
  requests.

  This is very similar to bug 1724677, which was happening pre-
  metldown/spectre. The only difference is we don't see "[critical] TFTP
  back-end failed" in the logs anymore.

  I connected to the console on this system and it had errors about
  timing out retrieving the grub-cfg, then it had an error message along
  the lines of "error not an ip" and then "double free".  After I
  connected but before I could get a screenshot the system rebooted and
  was directed by maas to power off, which it did successfully after
  booting to linux.

  Full logs are available here:
  https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
  ed277a020e7c/cpe_cloud_395/infra-logs.tar

  This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions



More information about the foundations-bugs mailing list