[Bug 1829829] Re: Ubuntu CI has been flaky for a week

Mon Jul 1 19:51:07 UTC 2019

Looking at the latest 20 i386 tests, I see failing tests for 12888,
12884, 12912, 12918, and 12919.

PR 12884 looks like it's adding a new test, TEST-35, so I'll assume that
test just needs more work.

PR 12888 failed once, but there are 3 later tests that all pass, so I
assume it's ok now.

PR 12919 appears to have failed while building upstream systemd, so I
assume that's a problem with the PR.

PR 12912 passed once, but then was run again and failed due to TEST-34 timing out.
PR 12918 failed also with TEST-34 timing out.

However, these aren't failing because they just didn't have enough time
to run.  They are failing because they are running "systemd-run --wait"
but the --wait never completes, even though the service finishes.  This
is what I was talking about before when I mentioned that dbus (bus-api-
system) appears to be crashing/restarting.  Note in the failing test
upstream-stdout:

$ grep -E '(bus-api-system|run-u7)' upstream-stdout 
...
run-u7.service: Installed new job run-u7.service/start as 322
run-u7.service: Enqueued job run-u7.service/start as 322
Bus bus-api-system: changing state RUNNING → CLOSING
Bus bus-api-system: changing state CLOSING → CLOSED
Bus bus-api-system: changing state UNSET → OPENING
Bus bus-api-system: changing state OPENING → AUTHENTICATING
run-u7.service: Failed to set 'memory.min' attribute on '/system.slice/run-u7.service' to '0': No such file or directory
run-u7.service: Failed to set 'memory.swap.max' attribute on '/system.slice/run-u7.service' to 'max': No such file or directory
run-u7.service: Failed to set 'memory.oom.group' attribute on '/system.slice/run-u7.service' to '0': No such file or directory
run-u7.service: Passing 0 fds to service
run-u7.service: About to execute: /usr/bin/test -f /var/lib/zzz/test
run-u7.service: Forked /usr/bin/test as 274
run-u7.service: Changed dead -> runningFound pre-existing private StateDirectory= directory /var/lib/private/zzz, migrating to /var/lib/zzz.
run-u7.service: Job 322 run-u7.service/start finished, result=done
run-u7.service: Executing: /usr/bin/test -f /var/lib/zzz/testsystemd-journald.service: Got notification message from PID 210 (WATCHDOG=1)
run-u7.service: Child 274 belongs to run-u7.service.
run-u7.service: Main process exited, code=exited, status=0/SUCCESS
run-u7.service: Succeeded.
run-u7.service: Changed running -> dead
run-u7.service: Consumed 59ms CPU time.
run-u7.service: Collecting.
Bus bus-api-system: changing state AUTHENTICATING → HELLO
Bus bus-api-system: changing state HELLO → RUNNING

It looks like bus-api-system is down when the service completes, and the notification that it finished never reaches the system-run --wait caller (the testsuite), so the testsuite just hangs doing nothing until the test timeout.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1829829

Title:
  Ubuntu CI has been flaky for a week

Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  It was originally reported in https://github.com/systemd/systemd/pull/12583#issuecomment-492949206 5 days ago. To judge from the logs VMs can't be rebooted there:
  ```
  Ubuntu 18.04.2 LTS autopkgtest ttyS0

  autopkgtest login:
  ---------------------------------------------------
  ------- nova show 91e76a78-d05c-412a-b383-55a26010ae69 (adt-bionic-amd64-systemd-upstream-20190516-051604) ------
  +--------------------------------------+------------------------------------------------------------------------------------+
  | Property                             | Value                                                                              |
  +--------------------------------------+------------------------------------------------------------------------------------+
  | OS-DCF:diskConfig                    | MANUAL                                                                             |
  | OS-EXT-AZ:availability_zone          | nova                                                                               |
  | OS-EXT-SRV-ATTR:host                 | euler                                                                              |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | euler.lcy01.scalingstack                                                           |
  | OS-EXT-SRV-ATTR:instance_name        | instance-003d216a                                                                  |
  | OS-EXT-STS:power_state               | 1                                                                                  |
  | OS-EXT-STS:task_state                | -                                                                                  |
  | OS-EXT-STS:vm_state                  | active                                                                             |
  | OS-SRV-USG:launched_at               | 2019-05-16T07:00:42.000000                                                         |
  | OS-SRV-USG:terminated_at             | -                                                                                  |
  | accessIPv4                           |                                                                                    |
  | accessIPv6                           |                                                                                    |
  | config_drive                         |                                                                                    |
  | created                              | 2019-05-16T07:00:33Z                                                               |
  | flavor                               | autopkgtest (f878e70e-9991-46e0-ba02-8ea159a71656)                                 |
  | hostId                               | 1722c5f2face86c3fc9f338ae96835924721512372342f664e6941bd                           |
  | id                                   | 91e76a78-d05c-412a-b383-55a26010ae69                                               |
  | image                                | adt/ubuntu-bionic-amd64-server-20190516.img (d00bf12c-467e-433f-a4f5-15720f13bff1) |
  | key_name                             | testbed-juju-prod-ues-proposed-migration-machine-11                                |
  | metadata                             | {}                                                                                 |
  | name                                 | adt-bionic-amd64-systemd-upstream-20190516-051604                                  |
  | net_ues_proposed_migration network   | 10.42.40.13                                                                        |
  | os-extended-volumes:volumes_attached | []                                                                                 |
  | progress                             | 0                                                                                  |
  | security_groups                      | autopkgtest at lcy01-27.secgroup                                                      |
  | status                               | ACTIVE                                                                             |
  | tenant_id                            | afaef86b96dd4828a1ed5ee395ea1421                                                   |
  | updated                              | 2019-05-16T07:00:42Z                                                               |
  | user_id                              | 8524250971084851b3792a68fbc398dd                                                   |
  +--------------------------------------+------------------------------------------------------------------------------------+
  ---------------------------------------------------
  <VirtSubproc>: failure: Timed out on waiting for ssh connection
  autopkgtest [07:07:45]: ERROR: testbed failure: cannot send to testbed: [Errno 32] Broken pipe
  ```

  Though judging by https://github.com/systemd/systemd/pull/12626, it
  appears that sometimes the tests pass.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1829829/+subscriptions