[Bug 1615021] Re: Unable to network boot Ubuntu 16.04 installer normally on Briggs
Martin Pitt
martin.pitt at ubuntu.com
Thu Aug 25 09:32:51 UTC 2016
I don't actually know what BOOT_DEBUG does -- I've never seen it before,
it does not appear anywhere in my yakkety system, and it's for sure not
something the kernel, initramfs-tools, or systemd look at. My best guess
is that this is a debian-installer specific debug flag.
So from what I can tell, the readlink path issue is merely a red herring
-- it's good to fix it of course, but it's unrelated to the boot
failure.
Since this is a heisenbug, it rather seems to me that this is some
timing issue -- any extra debugging, or time spent with changing boot
parameters in the boot loader will change the behaviour (e. g. make the
detection of network devices by the hardware finish earlier).
ATM I'm afraid there isn't enough useful information here yet to
understand what's going on -- indeed having a screen output where the
problem does happen would be helpful. dmesg logs and "udevadm info -e"
as well, as Steve says.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1615021
Title:
Unable to network boot Ubuntu 16.04 installer normally on Briggs
Status in busybox package in Ubuntu:
Fix Committed
Status in debian-installer package in Ubuntu:
Triaged
Status in systemd package in Ubuntu:
Fix Committed
Status in busybox source package in Xenial:
Won't Fix
Status in debian-installer source package in Xenial:
Triaged
Status in systemd source package in Xenial:
In Progress
Status in busybox source package in Yakkety:
Fix Committed
Status in debian-installer source package in Yakkety:
Triaged
Status in systemd source package in Yakkety:
Fix Committed
Bug description:
== Comment: #7 - Guilherme Guaglianoni Piccoli <gpiccoli at br.ibm.com> - 2016-08-19 10:08:07 ==
The normal procedure to perform a Netboot installation of Ubuntu 16.04 is to download the latest vmlinux and initrd.gz files available, and kexec them with no parameters (at least in ppc64el).
We're experiencing a strange issue in which the installer freezes
before menus are showed. The system hangs in the point specified
below, right after the i40e driver initialization:
[ 11.052832] i40e 0002:01:00.0 enP2p1s0f0: renamed from eth0
[ 11.073976] i40e 0002:01:00.1 enP2p1s0f1: renamed from eth1
[ 11.117799] i40e 0002:01:00.2 enP2p1s0f2: renamed from eth2
[ 11.225745] i40e 0002:01:00.3 enP2p1s0f3: renamed from eth3
***HANG***
The most difficult part in this issue is that it seems to be a timing
issue/race condition, and many debug trials end up by avoiding the
issue reproduction (heisenbug).
We were successful though in getting logs by booting the kernel with
the command-line "BOOT_DEBUG=2" and by changing the initrd in order to
enable systemd debug; only the files "init" and "start-udev" were
changed in initrd, both attached here.
We've attached here a saved screen session that shows the entire boot
process until it gets flooded with lots of messages like:
"starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
'/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
udev/rules.d/80-net-setup-link.rules': No such file or directory'
seq 3244 queued, 'add' 'pci_bus'
starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
passed 408 byte device to netlink monitor 0x1003cfe8020seq 3236 running'/bin/readlink /etc/udev/rules.d/80-net-setup-l
ink.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules': No such
file or directory'
'/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
udev/rules.d/80-net-setup-link.rules': No such file or directory'
Process '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' failed with exit code 2.
PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6
passed device to netlink monitor 0x1003d01f730
"
Then it keeps hanged in this stage. We re-tested it by changing the
file 73-usb-net-by-mac.rules in initrd, replacing "
/etc/udev/rules.d/80-net-setup-link.rules" to "/lib/udev/rules.d/80
-net-setup-link.rules", since the former does not exist whereas the
latter does. Same issue were observed!
Notice that if we boot the installer with command-line "net.ifnames=0"
or "net.ifnames=1", the problem does not reproduces anymore.
We want to ask Canonical's help in investigating this issue.
Thanks,
Guilherme
SRU INFORMATION for systemd
===========================
Test case:
* Check what happens for uevents on devices which are not USB network interfaces:
udevadm test /sys/devices/virtual/mem/null
udevadm test /sys/class/net/lo
With the current version these will run
PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
/lib/udev/rules.d/73-usb-net-by-mac.rules:6
which is pointless. With the proposed version these should be gone.
* Ensure that the rule still works as intended by connecting an USB
network device that has a permanent MAC address (e. g. Android
tethering uses a temporary MAC): You should get a MAC-based name like
"enx12345678" for it. Now disconnect it again, disable ifnames with
sudo ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules
and reconnect the device. You should now get a kernel name like "usb0"
for it.
* Regression potential: Errors in the rule could break persistent
naming - or its disabling - of USB network interfaces. Running the
above test carefully is important to ensure this keeps working. This
has little to no actual effect on anything else on the system (aside
from a performance impact and spamming logs), so overall the
regression potential is low.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/busybox/+bug/1615021/+subscriptions
More information about the foundations-bugs
mailing list