[Bug 2039148] [NEW] lxd-installer can race or temp-fail and then block itself
Christian Ehrhardt
2039148 at bugs.launchpad.net
Thu Oct 12 08:18:13 UTC 2023
Public bug reported:
Hey,
while checking for some other issue I realized that pre-installed LXD isn't always working.
It is fully pre-installed on cloud-images and server installs to provide users quick access to a great feature.
But in minimal images it is not installed (ok for the reason to be minimal), yet it is not fully gone either and what is left fails without a clear indication to the (uneducated) user.
There we have `lxd-installer`
Normal image:
```
$ snap list | grep lxd
lxd 5.0.2-838e1b2 24322 5.0/stable/… canonical** -
$ which lxc
/snap/bin/lxc
$ lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
```
But OTOH a minimal image has this ...
```
$ which lxc
/usr/sbin/lxc
$ dpkg -S /usr/sbin/lxc
dpkg-query: no path found matching pattern /usr/sbin/lxc
$ cat /usr/sbin/lxc
#!/bin/sh
SNAP_BIN="/snap/bin/$(basename $0)"
if [ ! -f ${SNAP_BIN} ]; then
python3 -c 'import socket; s=socket.socket(socket.AF_UNIX); s.connect("/run/lxd-installer.socket"); s.send(b"x"); s.recv(1)'
fi
exec $SNAP_BIN "$@"
$ snap list
No snaps are installed yet. Try 'snap install hello-world'.
AFAICS this is trying to use lxd-installer which is a package, so let me
try to file it against this and images in general.
It is trying to hold that connection until the installer has brought in
lxd and then pass it on.
# cat /lib/systemd/system/lxd-installer\@.service
[Unit]
Description=Helper to install lxd snap on demand
[Service]
ExecStart=/bin/sh -eux /usr/share/lxd-installer/lxd-installer-service
StandardInput=socket
StandardOutput=socket
StandardError=journal
Restart=no
This is up (as socket) after start as one would expect.
And it even works fine usually:
$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash
root at j-test:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; vendor preset: enabled)
Active: active (listening) since Thu 2023-10-12 08:00:29 UTC; 4s ago
Listen: /run/lxd-installer.socket (Stream)
Accepted: 0; Connected: 0;
Tasks: 0 (limit: 1171)
Memory: 0B
CPU: 374us
CGroup: /system.slice/lxd-installer.socket
Oct 12 08:00:29 j-test systemd[1]: Starting Helper to install lxd snap on demand...
Oct 12 08:00:29 j-test systemd[1]: Listening on Helper to install lxd snap on demand.
root at j-test:~# lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
But if instead this ever failed, then it is like:
root at m:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; preset: enabled)
Active: active (listening) since Thu 2023-09-28 09:47:48 UTC; 1 week 6 days ago
Triggers: ● lxd-installer at 3-13717-0.service
● lxd-installer at 1-13484-0.service
● lxd-installer at 2-13655-0.service
● lxd-installer at 0-13372-0.service
Listen: /run/lxd-installer.socket (Stream)
Accepted: 4; Connected: 0;
Tasks: 0 (limit: 38254)
Memory: 0B
CPU: 580us
CGroup: /system.slice/lxd-installer.socket
Sep 28 09:47:48 m systemd[1]: Starting lxd-installer.socket - Helper to install lxd snap on demand...
Sep 28 09:47:48 m systemd[1]: Listening on lxd-installer.socket - Helper to install lxd snap on demand.
root at m:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
While my initial case hitting this was due to an unknown failure
Oct 12 07:31:35 m systemd[1]: lxd-installer at 0-13372-0.service: Failed with result 'exit-code'.
Oct 12 07:32:55 m systemd[1]: Started lxd-installer at 1-13484-0.service - Helper to install lxd snap on demand (PID 13484/UID 0).
Oct 12 07:32:55 m systemd[1]: lxd-installer at 1-13484-0.service: Main process exited, code=exited, status=1/FAILURE
That was due to
snap install lxd
error: system does not fully support snapd: The "fuse" filesystem is required on this system but
not available. Please try to install the fuse package.
But you do not have to re-create this.
Instead the repro-case is not too hard using the impatience simulator:
$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash
# abort this first time to simulate any reason it might fail
root at j-test:~# lxc list
^CTraceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
# Now see it never coming back to live
root at j-test:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
This is due to the service counting as started and going on in the background.
$ ps axlf | grep installer
0 0 753 240 20 0 4020 2092 ? S+ pts/0 0:00 \_ grep --color=auto installer
4 0 556 1 20 0 2888 948 ? Ss ? 0:00 /bin/sh -eux /usr/share/lxd-installer/lxd-installer-service
If you wait long enough it will recover
There are a few scenarios I can think of:
1. a boot race, the socket is not yet up - triggers the same issue
2. a transient issue occurred, lxd-installer-service is still running in the background
3. there is a permanent, "lxd-installer will fail" problem
#1 and #2 should detect this and wait for the soon or already running job.
But it needs to be able to differ from #3 in which case it needs to give up at some point.
Maybe something simple like a retry + timeout logic might provide all of
that?
** Affects: lxd-installer (Ubuntu)
Importance: Undecided
Status: New
** Also affects: lxd-installer (Ubuntu)
Importance: Undecided
Status: New
** No longer affects: cloud-images
** Summary changed:
- lxd-installer is not idempotent
+ lxd-installer can race or temp-fail and then block itself
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to lxd-installer in Ubuntu.
https://bugs.launchpad.net/bugs/2039148
Title:
lxd-installer can race or temp-fail and then block itself
Status in lxd-installer package in Ubuntu:
New
Bug description:
Hey,
while checking for some other issue I realized that pre-installed LXD isn't always working.
It is fully pre-installed on cloud-images and server installs to provide users quick access to a great feature.
But in minimal images it is not installed (ok for the reason to be minimal), yet it is not fully gone either and what is left fails without a clear indication to the (uneducated) user.
There we have `lxd-installer`
Normal image:
```
$ snap list | grep lxd
lxd 5.0.2-838e1b2 24322 5.0/stable/… canonical** -
$ which lxc
/snap/bin/lxc
$ lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
```
But OTOH a minimal image has this ...
```
$ which lxc
/usr/sbin/lxc
$ dpkg -S /usr/sbin/lxc
dpkg-query: no path found matching pattern /usr/sbin/lxc
$ cat /usr/sbin/lxc
#!/bin/sh
SNAP_BIN="/snap/bin/$(basename $0)"
if [ ! -f ${SNAP_BIN} ]; then
python3 -c 'import socket; s=socket.socket(socket.AF_UNIX); s.connect("/run/lxd-installer.socket"); s.send(b"x"); s.recv(1)'
fi
exec $SNAP_BIN "$@"
$ snap list
No snaps are installed yet. Try 'snap install hello-world'.
AFAICS this is trying to use lxd-installer which is a package, so let
me try to file it against this and images in general.
It is trying to hold that connection until the installer has brought
in lxd and then pass it on.
# cat /lib/systemd/system/lxd-installer\@.service
[Unit]
Description=Helper to install lxd snap on demand
[Service]
ExecStart=/bin/sh -eux /usr/share/lxd-installer/lxd-installer-service
StandardInput=socket
StandardOutput=socket
StandardError=journal
Restart=no
This is up (as socket) after start as one would expect.
And it even works fine usually:
$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash
root at j-test:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; vendor preset: enabled)
Active: active (listening) since Thu 2023-10-12 08:00:29 UTC; 4s ago
Listen: /run/lxd-installer.socket (Stream)
Accepted: 0; Connected: 0;
Tasks: 0 (limit: 1171)
Memory: 0B
CPU: 374us
CGroup: /system.slice/lxd-installer.socket
Oct 12 08:00:29 j-test systemd[1]: Starting Helper to install lxd snap on demand...
Oct 12 08:00:29 j-test systemd[1]: Listening on Helper to install lxd snap on demand.
root at j-test:~# lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
But if instead this ever failed, then it is like:
root at m:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; preset: enabled)
Active: active (listening) since Thu 2023-09-28 09:47:48 UTC; 1 week 6 days ago
Triggers: ● lxd-installer at 3-13717-0.service
● lxd-installer at 1-13484-0.service
● lxd-installer at 2-13655-0.service
● lxd-installer at 0-13372-0.service
Listen: /run/lxd-installer.socket (Stream)
Accepted: 4; Connected: 0;
Tasks: 0 (limit: 38254)
Memory: 0B
CPU: 580us
CGroup: /system.slice/lxd-installer.socket
Sep 28 09:47:48 m systemd[1]: Starting lxd-installer.socket - Helper to install lxd snap on demand...
Sep 28 09:47:48 m systemd[1]: Listening on lxd-installer.socket - Helper to install lxd snap on demand.
root at m:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
While my initial case hitting this was due to an unknown failure
Oct 12 07:31:35 m systemd[1]: lxd-installer at 0-13372-0.service: Failed with result 'exit-code'.
Oct 12 07:32:55 m systemd[1]: Started lxd-installer at 1-13484-0.service - Helper to install lxd snap on demand (PID 13484/UID 0).
Oct 12 07:32:55 m systemd[1]: lxd-installer at 1-13484-0.service: Main process exited, code=exited, status=1/FAILURE
That was due to
snap install lxd
error: system does not fully support snapd: The "fuse" filesystem is required on this system but
not available. Please try to install the fuse package.
But you do not have to re-create this.
Instead the repro-case is not too hard using the impatience simulator:
$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash
# abort this first time to simulate any reason it might fail
root at j-test:~# lxc list
^CTraceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
# Now see it never coming back to live
root at j-test:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
This is due to the service counting as started and going on in the background.
$ ps axlf | grep installer
0 0 753 240 20 0 4020 2092 ? S+ pts/0 0:00 \_ grep --color=auto installer
4 0 556 1 20 0 2888 948 ? Ss ? 0:00 /bin/sh -eux /usr/share/lxd-installer/lxd-installer-service
If you wait long enough it will recover
There are a few scenarios I can think of:
1. a boot race, the socket is not yet up - triggers the same issue
2. a transient issue occurred, lxd-installer-service is still running in the background
3. there is a permanent, "lxd-installer will fail" problem
#1 and #2 should detect this and wait for the soon or already running job.
But it needs to be able to differ from #3 in which case it needs to give up at some point.
Maybe something simple like a retry + timeout logic might provide all
of that?
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd-installer/+bug/2039148/+subscriptions
More information about the foundations-bugs
mailing list