[Bug 2057965] [NEW] google-startup-scripts runs before cloud-init finished network setup

Catherine Redfield 2057965 at bugs.launchpad.net
Thu Mar 14 19:51:37 UTC 2024


Public bug reported:

New GCP dailies are failing startup-script tests, due to network not
being fully set up when startup scripts are run.  The failure can be
reproduced as follows:

Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list


$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ##     or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
> 
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]


On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour.  The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service.  As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.

On v20240307 (startup scripts execute correctly):
catred at startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
  └─ubuntu-advantage.service @28.480s
    └─cloud-config.service @27.372s +1.095s
      └─snapd.seeded.service @20.048s +7.312s
        └─snapd.service @12.469s +7.555s
          └─basic.target @11.558s
            └─sockets.target @11.540s
              └─snap.lxd.daemon.unix.socket @24.376s
                └─sysinit.target @10.825s
                  └─cloud-init.service @8.432s +2.267s
                    └─systemd-networkd-wait-online.service @6.467s +1.935s
                      └─systemd-networkd.service @6.347s +112ms
                        └─network-pre.target @6.328s
                          └─cloud-init-local.service @4.309s +2.006s
                            └─systemd-remount-fs.service @1.829s +68ms
                              └─systemd-fsck-root.service @1.587s +160ms
                                └─systemd-journald.socket @1.292s
                                  └─system.slice @1.068s
                                    └─-.slice @1.068s


On v20240314 (startup scripts fail):
catred at startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.

google-startup-scripts.service +260ms
└─multi-user.target @29.237s
  └─chrony.service @30.240s +56ms
    └─basic.target @13.364s
      └─sockets.target @13.225s
        └─snap.lxd.user-daemon.unix.socket @26.765s
          └─sysinit.target @12.550s
            └─cloud-init.service @7.933s +4.503s
              └─systemd-networkd-wait-online.service @6.741s +1.171s
                └─systemd-networkd.service @6.593s +124ms
                  └─network-pre.target @6.573s
                    └─cloud-init-local.service @4.478s +2.083s
                      └─systemd-remount-fs.service @1.717s +64ms
                        └─systemd-fsck-root.service @1.510s +95ms
                          └─systemd-journald.socket @1.193s
                            └─-.mount @974ms
                              └─-.slice @974ms


This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init.

** Affects: google-guest-agent (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to google-guest-agent in Ubuntu.
https://bugs.launchpad.net/bugs/2057965

Title:
  google-startup-scripts runs before cloud-init finished network setup

Status in google-guest-agent package in Ubuntu:
  New

Bug description:
  New GCP dailies are failing startup-script tests, due to network not
  being fully set up when startup scripts are run.  The failure can be
  reproduced as follows:

  Using startup_script.sh:
  #!/bin/bash
  cp /etc/apt/sources.list /tmp/startup-sources.list

  
  $ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
  [...]
  $ ssh [INSTANCE IP]
  > diff /tmp/startup-sources.list /etc/apt/sources.list
  0a1,8
  > ## Note, this file is written by cloud-init on first boot of an instance
  > ## modifications made here will not survive a re-bundle.
  > ## if you wish to make changes you can:
  > ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
  > ##     or do the same in user-data
  > ## b.) add sources in /etc/apt/sources.list.d
  > ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
  > 
  3,4c11,12
  < deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
  < # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
  ---
  > deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
  > # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
  8,9c16,17
  < deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
  < # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
  ---
  [...]

  
  On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour.  The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service.  As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.

  On v20240307 (startup scripts execute correctly):
  catred at startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
  The time when unit became active or started is printed after the "@" character.
  The time the unit took to start is printed after the "+" character.

  google-startup-scripts.service +18.262s
  └─multi-user.target @28.480s
    └─ubuntu-advantage.service @28.480s
      └─cloud-config.service @27.372s +1.095s
        └─snapd.seeded.service @20.048s +7.312s
          └─snapd.service @12.469s +7.555s
            └─basic.target @11.558s
              └─sockets.target @11.540s
                └─snap.lxd.daemon.unix.socket @24.376s
                  └─sysinit.target @10.825s
                    └─cloud-init.service @8.432s +2.267s
                      └─systemd-networkd-wait-online.service @6.467s +1.935s
                        └─systemd-networkd.service @6.347s +112ms
                          └─network-pre.target @6.328s
                            └─cloud-init-local.service @4.309s +2.006s
                              └─systemd-remount-fs.service @1.829s +68ms
                                └─systemd-fsck-root.service @1.587s +160ms
                                  └─systemd-journald.socket @1.292s
                                    └─system.slice @1.068s
                                      └─-.slice @1.068s

  
  On v20240314 (startup scripts fail):
  catred at startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
  The time when unit became active or started is printed after the "@" characte>
  The time the unit took to start is printed after the "+" character.

  google-startup-scripts.service +260ms
  └─multi-user.target @29.237s
    └─chrony.service @30.240s +56ms
      └─basic.target @13.364s
        └─sockets.target @13.225s
          └─snap.lxd.user-daemon.unix.socket @26.765s
            └─sysinit.target @12.550s
              └─cloud-init.service @7.933s +4.503s
                └─systemd-networkd-wait-online.service @6.741s +1.171s
                  └─systemd-networkd.service @6.593s +124ms
                    └─network-pre.target @6.573s
                      └─cloud-init-local.service @4.478s +2.083s
                        └─systemd-remount-fs.service @1.717s +64ms
                          └─systemd-fsck-root.service @1.510s +95ms
                            └─systemd-journald.socket @1.193s
                              └─-.mount @974ms
                                └─-.slice @974ms

  
  This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/google-guest-agent/+bug/2057965/+subscriptions




More information about the foundations-bugs mailing list