[Bug 1863639] Re: LP1852678 - MAAS is wiping out network config

Andrew Cloke andrew.cloke at canonical.com
Mon Feb 17 17:04:29 UTC 2020


Note that this is a reverse proxy of bug# 1852678. Please go to that bug
to read the comments and post responses.

** Also affects: ubuntu-power-systems
   Importance: Undecided
       Status: New

** Package changed: netcfg (Ubuntu) => ubuntu

** Changed in: ubuntu-power-systems
       Status: New => Incomplete

** Changed in: ubuntu
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to netcfg in Ubuntu.
https://bugs.launchpad.net/bugs/1863639

Title:
  LP1852678 - MAAS is wiping out network config

Status in The Ubuntu-power-systems project:
  Incomplete
Status in Ubuntu:
  Incomplete

Bug description:
  == Comment: #0 - Michael Ranweiler <mranweil at us.ibm.com> - 2020-02-17 08:22:09 ==
  I'm on a MAAS server attempting to deploy a node with 4 network ports. I configured all four of them and started a deployment of 19.10. However, during deployment, MAAS wipes out the configuration I had set, returning three of four network devices back to Unconfigured.

  I've only seen this on this one machine, oddly enough, but MAAS is
  just silently resetting the config and is giving no indication in the
  web UI why it's doing so, there are no errors.

  I do not have access to this box for logs, it is the Server Team's Power MAAS environment.
  Tags: ppc64el reverse-proxy-bugzilla Edit Tag help
  Jeff Lane (bladernr) wrote on 2019-11-15: 	#1

      thiel-ip-before-deploy.png Edit (95.4 KiB, image/png)

  Screenshot of the interface configuration I set before deployment (note this is right after I started deployment and it shows the IP addresses MAAS has assigned to each interface)
  Jeff Lane (bladernr) wrote on 2019-11-15: 	#2

      thiel-ip-during-deployment.png Edit (74.1 KiB, image/png)

  Screenshot of the interface configuration well into deployment where MAAS has wiped my config and reset three of four interfaces back to Unconfigured.
  Frank Heimes (fheimes) on 2019-11-15
  tags: 	added: ppc64el
  Frank Heimes (fheimes) on 2019-11-15
  Changed in ubuntu-power-systems:
  assignee: 	nobody ? MAAS (maas)
  Lee Trager (ltrager) wrote on 2019-11-15: 	#3

  What version of MAAS are you using? Can you post the machine output from the API(maas $PROFILE machine read $SYSTEM_ID)?
  Changed in maas:
  status: 	New ? Incomplete
  Frank Heimes (fheimes) wrote on 2019-11-18: 	#4

  Not exactly what you want, but at least soem more version info taken from the UI:
  MAAS name: power8-maas MAAS
  MAAS version: 2.6.0 (7802-g59416a869-0ubuntu1~18.04.1)
  Changed in ubuntu-power-systems:
  status: 	New ? Triaged
  Jeff Lane (bladernr) wrote on 2019-11-18: 	#5

      thiel-ip-before-deploy.png Edit (95.4 KiB, image/png)

  Hi Lee,

  this is the machine output before deployment with everything configured.
  Jeff Lane (bladernr) wrote on 2019-11-18: 	#6

      thiel-before-deploy Edit (27.6 KiB, text/plain)

  uhhh... scratch that, wrong file, that's the original screen shot. THIS is the machine output.
  Jeff Lane (bladernr) wrote on 2019-11-18: 	#7

      thiel-after-deploy Edit (18.3 KiB, text/plain)

  And this is the output during deployment after the interfaces are reset to Unconfigured.
  Changed in maas:
  status: 	Incomplete ? New
  Jeff Lane (bladernr) wrote on 2019-11-18: 	#8

      other-machine-mixed-subnets Edit (20.4 KiB, text/plain)

  OK so I'm a bit confused. I reconfigured all the ports to use the same
  subnet, and this time it worked.

  But this is just weird, because OTHER machines have deployed fine with mixed subnets, such as the one in the file attached to this comment.
  Lee Trager (ltrager) wrote on 2019-11-25: 	#9

  This looks like it is getting reset in the backend and is not a UI issue. Its very difficult for me to see whats going on as I can't reproduce this locally and don't have access to logs. Is there anyway you can get logs or give me access to the system so I can poke around and see whats happening?
  Jeff Lane (bladernr) wrote on 2019-12-09: 	#10

  This was on the Server Team's Power8 MAAS infra, unfortunately it's not my environment so I'm not able to give you access. Perhaps Josh Powers could get you that access?
  Lee Trager (ltrager) wrote on 2019-12-18: 	#11

      Screenshot_2019-12-18 thiel maas power8-maas MAAS.png Edit (81.5
  KiB, image/png)

  I am unable to reproduce this. I configured thiel to auto assign every
  interface on the system and deployed 19.10. IP addresses were assigned
  as expected. I confirmed that the host was configured as expected as
  well.

  buntu at thiel:~$ cat /etc/os-release
  NAME="Ubuntu"
  VERSION="19.10 (Eoan Ermine)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 19.10"
  VERSION_ID="19.10"
  HOME_URL="https://www.ubuntu.com/"
  SUPPORT_URL="https://help.ubuntu.com/"
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
  PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
  VERSION_CODENAME=eoan
  UBUNTU_CODENAME=eoan
  ubuntu at thiel:~$ ip addr
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  2: enP2p1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
      link/ether 0c:c4:7a:89:f0:64 brd ff:ff:ff:ff:ff:ff
      inet 10.245.71.140/21 brd 10.245.71.255 scope global enP2p1s0f0
         valid_lft forever preferred_lft forever
      inet6 fe80::ec4:7aff:fe89:f064/64 scope link
         valid_lft forever preferred_lft forever
  3: enP2p1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
      link/ether 0c:c4:7a:89:f0:65 brd ff:ff:ff:ff:ff:ff
      inet 10.245.71.139/21 brd 10.245.71.255 scope global enP2p1s0f1
         valid_lft forever preferred_lft forever
      inet6 fe80::ec4:7aff:fe89:f065/64 scope link
         valid_lft forever preferred_lft forever
  4: enP2p1s0f2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
      link/ether 0c:c4:7a:89:f0:66 brd ff:ff:ff:ff:ff:ff
      inet 10.245.71.177/21 brd 10.245.71.255 scope global enP2p1s0f2
         valid_lft forever preferred_lft forever
      inet6 fe80::ec4:7aff:fe89:f066/64 scope link
         valid_lft forever preferred_lft forever
  5: enP2p1s0f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
      link/ether 0c:c4:7a:89:f0:67 brd ff:ff:ff:ff:ff:ff
      inet 10.245.71.162/21 brd 10.245.71.255 scope global enP2p1s0f3
         valid_lft forever preferred_lft forever
      inet6 fe80::ec4:7aff:fe89:f067/64 scope link
         valid_lft forever preferred_lft forever
  Changed in maas:
  status: 	New ? Incomplete
  Jeff Lane (bladernr) wrote on 2019-12-20: 	#12

      thiel-retry-before-fail.png Edit (69.1 KiB, image/png)

  I recreated this immediately. Please review the screenshots and configure Thiel as I have in this new screenshot.
  Jeff Lane (bladernr) wrote on 2019-12-20: 	#13

      thiel-retry-after-failure.png Edit (62.5 KiB, image/png)

  And it only took a couple minutes for it to reset the config per this screenshot.
  Changed in maas:
  status: 	Incomplete ? Confirmed
  Andrew Cloke (andrew-cloke) on 2020-01-06
  Changed in ubuntu-power-systems:
  importance: 	Undecided ? Medium
  Changed in maas:
  importance: 	Undecided ? Medium
  Lee Trager (ltrager) wrote on 2020-01-14: 	#14

  Looking at our two screenshots the only difference seems to be I was
  able to deploy all interfaces on 10.245.71.0/21 while you tried to
  deploy 3 interfaces on 192.168.122.0/24. When I configured the machine
  I believe 10.245.71.0/21 was the default network and only one I could
  choose.

  Which network should these interfaces be on?
  Have you tried recommissioning the machine?
  Lee Trager (ltrager) wrote on 2020-01-14: 	#15

  I poked around in the logs and found what appears to be happening. I
  think the VLAN isn't being updated until the netplan config is
  generated. When that happens MAAS deletes all assigned IPs. Will dig
  in further to see what the fix is tomorrow.

  2020-01-14 07:49:33 maasserver.region_controller: [info] Reloaded DNS configuration:
           * ip 10.245.71.163 allocated
           * ip 192.168.122.4 allocated
           * ip 192.168.123.3 allocated
           * ip 192.168.122.3 allocated
  2020-01-14 07:49:56 regiond: [info] 10.245.71.3 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
  2020-01-14 07:50:26 regiond: [info] 10.245.71.3 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
  2020-01-14 07:50:56 regiond: [info] 10.245.71.3 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
  2020-01-14 07:51:17 maasserver.regiondservices.active_discovery: [info] Active network discovery: Active scanning is not enabled on any subnet. Skipping periodic scan.
  2020-01-14 07:51:26 regiond: [info] 10.245.71.3 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f0 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f2 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f1 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f2 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f1 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  2020-01-14 07:51:28 maasserver.models.signals.interfaces: [info] enP2p1s0f1 (physical) on thiel: deleted IP addresses due to VLAN update (5002 -> 0).
  Lee Trager (ltrager) wrote on 2020-01-23: 	#16

      Screenshot_2020-01-22 thiel maas - events power8-maas
  MAAS.pngScreenshot_2020-01-22 thiel maas - events power8-maas MAAS.png
  Edit (214.9 KiB, image/png)

  I see what is happening now. On boot all four interfaces are
  requesting pxelinux.cfg at once using the same IP.

  2020-01-23 02:50:58 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-67 requested by 10.245.71.191
  2020-01-23 02:50:58 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-67 requested by 10.245.71.191
  2020-01-23 02:50:59 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-65 requested by 10.245.71.191
  2020-01-23 02:50:59 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-64 requested by 10.245.71.191
  2020-01-23 02:50:59 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-66 requested by 10.245.71.191
  2020-01-23 02:50:59 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-64 requested by 10.245.71.191
  2020-01-23 02:50:59 provisioningserver.rackdservices.tftp: [info] /ppc64el/pxelinux.cfg/01-0c-c4-7a-89-f0-66 requested by 10.245.71.191
  2020-01-23 02:51:28 provisioningserver.rackdservices.http: [info] /images/ubuntu/ppc64el/ga-19.10/eoan/daily/boot-kernel requested by 10.245.71.191
  2020-01-23 02:51:28 provisioningserver.rackdservices.http: [info] /images/ubuntu/ppc64el/ga-19.10/eoan/daily/boot-initrd requested by 10.245.71.191
  2020-01-23 02:51:56 provisioningserver.rackdservices.http: [info] /images/ubuntu/ppc64el/ga-19.10/eoan/daily/squashfs requested by 10.245.71.191

  MAAS keeps track of the interface being used for booting as well as
  the VLAN booting happens on. Because each device is requesting
  pxelinux.cfg MAAS sets the boot_interface to each device. MAAS sees
  that the request came in on a VLAN other then what that interface is
  configured for and updates it. Updating a VLAN causes all IP
  information to be automatically deleted.

  Machines normally request boot information one device at a time which
  allows MAAS's algorithm to work and not in parallel.

  * Why are all interfaces requesting boot information at once?
  * Why are all requests coming in using the same IP on 10.245.64.0/21 from 0c:c4:7a:89:f0:67?
  * Can you try updating the firmware?
  Changed in maas:
  status: 	Confirmed ? Incomplete
  Frank Heimes (fheimes) wrote on 2020-02-03: 	#17

  I checked the firmware and it looks like it's the recommended "prod" level.
  Petitboot System Information:
   System type: 8001-22C
   System id: C829UAF04B10265
   Primary platform versions:
          open-power-IBM-P8DTU-V2.00.GA2.SP1-20180105-prod
          op-build-4059438
          hostboot-7fdfb37
          occ-301b535
          skiboot-5.4.2-2a21b57
          linux-4.4.24-openpower1-48c3582
          petitboot-v1.4.0-ee0f918
          p8dtu-xml-04e8a01
   BMC current side:
          Device ID: 0x20
          Device Rev: 0x1
          Firmware version: 1.27.00000
          IPMI version: 2

  Since I booted manually into Petitboot anyway, I verified what's
  possible there to avoid PXE boot from multiple interfaces. And it
  looks like it can be configured / restricted.

  I changed the Petitboot settings to only allow nw boot from enP2p1s0f0
  and only allow DHCP on that same, single interface, too - looks now
  like this:

   Petitboot (v1.4.0-ee0f918) 8001-22C C829UAF04B10265
   ??????????????????????????????????????????????????????????????????????????????
    [Network: enP2p1s0f0 / 0c:c4:7a:89:f0:64]
      execute
      netboot enP2p1s0f0 (pxelinux.0)
    [Disk: sda2 / bdbebffe-6fa8-4783-a82b-3b470dd78440]
      Ubuntu, with Linux 5.4.0-12-generic (recovery mode)
      Ubuntu, with Linux 5.4.0-12-generic
      Ubuntu

    System information
    System configuration
    System status log
    Language
    Rescan devices
    Retrieve config from URL
   *Exit to shell

  After booting (an already existing) test Ubuntu from disk, I was able
  to verify that all interfaces are (still) there (as expected, just
  double-checked).

  I then commissioned the system again and deployed it using MAAS - it all worked fine.
  And afaics it only did PXE from one interface.

  So it 'seems' like it is now fixed with the Petitboot re-config.

  So would you be able to re-try, Jeff?

  PS: I think we didn't faced such an issue before, because we usually only have one port connected to the nw, to not waste too many switch ports. But for this machine it was recently requested to have all ports connected...
  Changed in maas:
  status: 	Incomplete ? New
  Frank Heimes (fheimes) wrote on 2020-02-03: 	#18

  I just released 'thiel' again ...
  Andrew Cloke (andrew-cloke) wrote on 2020-02-10: 	#19

  Looking at Lee's comment #16, if I'm reading that correctly, four
  interfaces with different MAC addresses appear to be using the same IP
  address to request pxelinux.cfg in parallel.

  What seems odd to me, is that the four interfaces (with different MAC addresses) are using the same IP address. Is it possible to see from the MAAS server's DHCP log whether the four different MAC addresses have been allocated different IP addresses?
  Lee Trager (ltrager) wrote on 2020-02-10: 	#20

  In #15 you can see MAAS allocates four different IP addresses on the correct subnets. For whatever reason the firmware is using the same IP for all interfaces. I'm marking this as needs more information as from what I can tell this is a firmware issue.
  Changed in maas:
  status: 	New ? Incomplete
  Andrew Cloke (andrew-cloke) wrote 2 hours ago: 	#21

  To summarise this issue:
  * when the 4 NICs are all configured to PXEBoot and are connected to the same vlan, even though they have been assigned different IP addresses, during the PXEBoot sequence all NICs appear to request pxelinux.cfg using the same IP address (see comment #16).
  * when PXEBoot is disabled on all but one of the NICs, it works fine.

  The question is whether this is a firmware issue with the NIC, or whether this is expected behaviour?
  Changed in ubuntu-power-systems:
  assignee: 	MAAS (maas) ? bugproxy (bugproxy)
  Frank Heimes (fheimes) 2 hours ago
  tags: 	added: reverse-proxy-bugzilla

  == Comment: #2 - Michael Ranweiler <mranweil at us.ibm.com> - 2020-02-17 08:22:37 ==
  Please reverse mirror this bug:
  https://bugs.launchpad.net/ubuntu-power-systems/+bug/1852678

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1863639/+subscriptions



More information about the foundations-bugs mailing list