[Bug 2100564] Re: lxd-installer shim fails to install with snapstore error

Robie Basak 2100564 at bugs.launchpad.net
Wed Apr 2 13:41:11 UTC 2025


This is a very well written up SRU. Thank you! I have three review
points:

1. Since this involves impact to a specific Internet service, please
document sign-off from the operators of api.snapcraft.io that they are
happy with this change. I guess it may triple service load but
specifically at the times when the service is already struggling under
load? Plucky isn't released yet, so they're likely only going to notice
a significant difference in production (if there will be any) when this
SRU lands :)

2. I think there's an additional regression risk here: users for whom it
already _always_ fails will now take even longer to fail. This might
affected automated air-gapped deployments, for example, where
api.snapcraft.io might currently time out, but now it will have to time
out three times plus six seconds. I think that such an environment
*should* explicitly and immediately reject, but firewalls are often not
configured to do that in practice. Could this tip such an automated
deployment over the edge if it itself has a timeout for completion? How
long is the connection timeout? If short then probably this isn't
significant; if long (eg. minutes) then it could be. It isn't a big deal
for someone affected to fix this by extending their own timeout or
(better) not triggering lxd when it is bound to fail anyway, but it
might be infuriating to deal with that in an area that is already
frustrating to some set of our users and the expectation is that it
won't regress further in an SRU.

3. Test Plan: could you perhaps simulate this problem with
api.snapcraft.io? For example, redirect it in /etc/hosts and then use
`nc -l -p 443 </dev/null` or similar? It's not exactly the same, but if
it's easy and good enough, then that would be better than a test that
isn't certain to actually exercise the retry path if the real service
happens to be working better at the time of the test.

Of these, 2 gives me reason for hesitation since this is the kind of
change that has affected me in the real world before. I'm on the fence
as to whether it needs mitigating or not. Please could you consider this
scenario, maybe try and measure the impact, and report your thoughts?

1 and 3 are OK to be resolved before release to -updates rather than
blocking now.

Apart from that, I've reviewed the current uploads in Noble and
Oracular, everything else looks fine, and once the above is resolved I'd
be happy to accept from the queues without re-review. I see bug tasks
open for Focal and Jammy, but no uploads for them, so those are not
reviewed.


** Changed in: lxd-installer (Ubuntu Noble)
       Status: New => Incomplete

** Changed in: lxd-installer (Ubuntu Oracular)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to lxd-installer in Ubuntu.
https://bugs.launchpad.net/bugs/2100564

Title:
  lxd-installer shim fails to install with snapstore error

Status in lxd-installer package in Ubuntu:
  Fix Released
Status in lxd-installer source package in Focal:
  New
Status in lxd-installer source package in Jammy:
  New
Status in lxd-installer source package in Noble:
  Incomplete
Status in lxd-installer source package in Oracular:
  Incomplete
Status in lxd-installer source package in Plucky:
  Fix Released

Bug description:
  [ Impact ]

  This has been affecting minimal and base pipelines for cloud images
  for both oracular and noble and blocking publication for these images.
  The oracular and noble images fail lxd related tests because lxd
  installation cannot complete with the following error
  `ConnectionResetError: [Errno 104] Connection reset by peer`. This
  error is intermittent.

  The proposed upload allows retries of the installation to mitigate
  availability issues from the snapstore.

  [ Test Plan ]

  To reproduce the bug you can launch a container from either oracular
  or noble like so: `lxc init ubuntu-<SUITE>-daily:<SUITE> test && lxc
  start test && lxc exec test bash` where SUITE=oracular or noble. Since
  this error is intermittent, this reproducer is also the same.

  Upon connecting to the container you can run `lxd init --auto
  --storage-backend dir` which results into the error
  `ConnectionResetError: [Errno 104] Connection reset by peer`
  intermittently. From journalctl lxd logs, the following appears:
  `error: cannot install "lxd": Post
  "https://api.snapcraft.io/v2/snaps/refresh":`

  The package with the proposed changes will allow the retry loop to
  attempt installing lxd in the oracular/noble containers in cases where
  connection to api.snapcraft.io is not achieved in the first attempt.

  [ Where problems could occur ]

  The possible effect of this change is posting to snapcraft more
  frequently which could increase the traffic to
  `https://api.snapcraft.io`. The number of retries is however limited
  and the installation failure intermittent so the additional retries
  should not have a significant impact on Snapcraft. The package itself
  does not have any changes as the retry targets only the installer.

  [ Other Info ]

  The same change in lxd-installer has been accepted for plucky and is
  has been in build pipelines for our plucky images for a few weeks now;
  these images do not fail on the lxd-installer anymore.

  [Original Description]

  Description:    Ubuntu Plucky Puffin (development branch) (*LXD container)
  Release:        25.04
  Image serial:   20250217
  Source package: https://launchpad.net/ubuntu/plucky/+package/lxd-installer
  Package version:
  ```
  $ apt-cache policy lxd-installer
  lxd-installer:
    Installed: 10
    Candidate: 10
    Version table:
   *** 10 100
          100 /var/lib/dpkg/status
  ```

  Expected behaviour: `lxd init --auto --storage-backend dir` command
  would trigger then `lxd-installer` shim and`lxd` would be installed in
  the container.

  What happened instead:
  ```
  10:19:41 2025-02-17 17:19:41,454 [INFO] test_framework.tests.lxd_start_stop: initializing lxd
  10:19:41 Installing LXD snap, please be patient.
  10:19:41 Traceback (most recent call last):
  10:19:41   File "<string>", line 1, in <module>
  10:19:41     import socket; s=socket.socket(socket.AF_UNIX); s.connect("/run/lxd-installer.socket"); s.send(b"x"); s.recv(1)
  10:21:12                                                                                                           ~~~~~~^^^
  10:21:12 ConnectionResetError: [Errno 104] Connection reset by peer
  ```

  The command that triggers the shim is `lxd init --auto --storage-
  backend dir`

  Reproducer:
  This failure is intermittent, and I have had a hard time getting a reliable reproducer for it. The best that I have got is:
  ```
  lxc init ubuntu-minimal-daily:plucky test #20250217 serial was pulled
  lxc start test
  lxc exec test bash
  <inside the shell>
  lxd init --auto --storage-backend dir # confirm with 'y'
  ```

  The error should appear almost instantly. If it does not, stop &
  delete the instance and repeat the above.

  *LXD team suggested reproducer:
  ```
  $ while lxc launch ubuntu-minimal-daily:plucky test && lxc exec --force-noninteractive test -- lxc --version && lxc delete -f test; do sleep 1; done
  ```

  Logs:
  `journalctl --grep lxd`:
  ```
  root at genuine-satyr:~# journalctl --grep lxd
  Feb 25 23:35:52 genuine-satyr useradd[324]: add 'ubuntu' to group 'lxd'
  Feb 25 23:35:52 genuine-satyr useradd[324]: add 'ubuntu' to shadow group 'lxd'
  Feb 25 23:35:53 genuine-satyr systemd[1]: Starting lxd-installer.socket - Helper to install lxd snap on demand...
  Feb 25 23:35:53 genuine-satyr systemd[1]: Listening on lxd-installer.socket - Helper to install lxd snap on demand.
  Feb 25 23:35:56 genuine-satyr cloud-init[262]: Cloud-init v. 25.1~3geb1965a4-0ubuntu1 finished at Tue, 25 Feb 2025 23:35:56 +0000. Datasource DataSourceLXD.  Up 5.51 seconds
  Feb 25 23:36:00 genuine-satyr systemd[1]: Created slice system-lxd\x2dinstaller.slice - Slice /system/lxd-installer.
  Feb 25 23:36:00 genuine-satyr systemd[1]: Started lxd-installer at 0-503-0.service - Helper to install lxd snap on demand (PID 503/UID 0).
  Feb 25 23:36:00 genuine-satyr lxd-installer-service[504]: + [ lxd-installer-service = lxd-installer-service ]
  Feb 25 23:36:00 genuine-satyr lxd-installer-service[511]: + lxd_channel
  Feb 25 23:36:00 genuine-satyr lxd-installer-service[504]: + snap install lxd --channel=5.21/stable/ubuntu-25.04
  Feb 25 23:36:00 genuine-satyr snapd[348]: api_snaps.go:467: Installing snap "lxd" revision unset
  Feb 25 23:36:00 genuine-satyr lxd-installer-service[512]: error: cannot install "lxd": Post "https://api.snapcraft.io/v2/snaps/refresh":
  Feb 25 23:36:00 genuine-satyr systemd[1]: lxd-installer at 0-503-0.service: Main process exited, code=exited, status=1/FAILURE
  Feb 25 23:36:00 genuine-satyr systemd[1]: lxd-installer at 0-503-0.service: Failed with result 'exit-code'.
  ```

  `journalctl` around failure:
  ```
  Feb 25 23:40:12 test systemd[1]: Created slice system-lxd\x2dinstaller.slice - Slice /system/lxd-installer.
  Feb 25 23:40:12 test systemd[1]: Started lxd-installer at 0-508-0.service - Helper to install lxd snap on demand (PID 508/UID 0).
  Feb 25 23:40:12 test lxd-installer-service[509]: + [ lxd-installer-service = lxd-installer-service ]
  Feb 25 23:40:12 test lxd-installer-service[509]: + snap wait system seed.loaded
  Feb 25 23:40:12 test lxd-installer-service[516]: + lxd_channel
  Feb 25 23:40:12 test lxd-installer-service[516]: + track=
  Feb 25 23:40:12 test lxd-installer-service[516]: + [ -r /etc/os-release ]
  Feb 25 23:40:12 test lxd-installer-service[516]: + . /etc/os-release
  Feb 25 23:40:12 test lxd-installer-service[516]: + PRETTY_NAME=Ubuntu Plucky Puffin (development branch)
  Feb 25 23:40:12 test lxd-installer-service[516]: + NAME=Ubuntu
  Feb 25 23:40:12 test lxd-installer-service[516]: + VERSION_ID=25.04
  Feb 25 23:40:12 test lxd-installer-service[516]: + VERSION=25.04 (Plucky Puffin)
  Feb 25 23:40:12 test lxd-installer-service[516]: + VERSION_CODENAME=plucky
  Feb 25 23:40:12 test lxd-installer-service[516]: + ID=ubuntu
  Feb 25 23:40:12 test lxd-installer-service[516]: + ID_LIKE=debian
  Feb 25 23:40:12 test lxd-installer-service[516]: + HOME_URL=https://www.ubuntu.com/
  Feb 25 23:40:12 test lxd-installer-service[516]: + SUPPORT_URL=https://help.ubuntu.com/
  Feb 25 23:40:12 test lxd-installer-service[516]: + BUG_REPORT_URL=https://bugs.launchpad.net/ubuntu/
  Feb 25 23:40:12 test lxd-installer-service[516]: + PRIVACY_POLICY_URL=https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
  Feb 25 23:40:12 test lxd-installer-service[516]: + UBUNTU_CODENAME=plucky
  Feb 25 23:40:12 test lxd-installer-service[516]: + LOGO=ubuntu-logo
  Feb 25 23:40:12 test lxd-installer-service[516]: + track=5.21
  Feb 25 23:40:12 test lxd-installer-service[516]: + [ -n 5.21 ]
  Feb 25 23:40:12 test lxd-installer-service[516]: + [ -n 25.04 ]
  Feb 25 23:40:12 test lxd-installer-service[516]: + echo 5.21/stable/ubuntu-25.04
  Feb 25 23:40:12 test lxd-installer-service[509]: + CHANNEL=5.21/stable/ubuntu-25.04
  Feb 25 23:40:12 test lxd-installer-service[509]: + [ -z 5.21/stable/ubuntu-25.04 ]
  Feb 25 23:40:12 test lxd-installer-service[509]: + snap install lxd --channel=5.21/stable/ubuntu-25.04
  Feb 25 23:40:12 test snapd[345]: api_snaps.go:467: Installing snap "lxd" revision unset
  Feb 25 23:40:12 test snapd[345]: store_download.go:142: no host system xdelta3 available to use deltas
  Feb 25 23:40:12 test lxd-installer-service[517]: error: cannot install "lxd": Post "https://api.snapcraft.io/v2/snaps/refresh":
  Feb 25 23:40:12 test lxd-installer-service[517]:        context canceled
  Feb 25 23:40:12 test systemd[1]: lxd-installer at 0-508-0.service: Main process exited, code=exited, status=1/FAILURE
  Feb 25 23:40:12 test systemd[1]: lxd-installer at 0-508-0.service: Failed with result 'exit-code'.
  Feb 25 23:40:14 test snapd[345]: overlord.go:518: Released state lock file
  Feb 25 23:40:14 test snapd[345]: daemon stop requested to wait for socket activation
  Feb 25 23:40:14 test systemd[1]: snapd.service: Deactivated successfully.
  Feb 25 23:40:14 test systemd[1]: snapd.service: Consumed 1.014s CPU time, 37.6M memory peak.
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd-installer/+bug/2100564/+subscriptions




More information about the foundations-bugs mailing list