[Bug 1938299] Re: Unable to SSH Into Instance when deploying Impish 21.10

Chad Smith 1938299 at bugs.launchpad.net
Tue Oct 12 02:37:59 UTC 2021


To clarify the actual root cause here and reflect it back to this
original bug.

   google-guest-agent defines a `PartOf=` relationship with systemd-
networkd.service[1]. This relationship means that if systemd-
networkd.service is either stopped, google-guest-agent.service gets
stopped. When systemd-networkd.service is restarted, so is google-guest-
agent.

But if systemd-networkd.service is subsequently started after a previous
stop call, google-guest-agent is left in stopped state. The call
`netplan apply` (emitted by cloud-init after writing network config) in
fact calls systemctl stop systemd-networkd.service and follows it with a
'start' instead of directly invoking systemctl restart systemd-
networkd.service[2].  This leaves google-guest in stopped state
indefinitely.

I'm not entirely sure netplan can fix this issue due to some other
cleanup they are doing between networkd stop and start, but I have
reflected this bug to netplan.io folks and we'll see what the consensus
is about whether this can be resolved with instrumenting a "systemctl
restart" instead of separate "systemctl stop" and "systemctl start"
calls.

References:
[1] https://github.com/GoogleCloudPlatform/guest-agent/blob/main/google-guest-agent.service#L13
[2] https://git.launchpad.net/ubuntu/+source/netplan.io/tree/netplan/cli/commands/apply.py?h=applied/ubuntu/devel#n169

** Also affects: netplan.io (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to google-guest-agent in Ubuntu.
Matching subscriptions: foundations-bugs
https://bugs.launchpad.net/bugs/1938299

Title:
  Unable to SSH Into Instance when deploying Impish 21.10

Status in cloud-init package in Ubuntu:
  Fix Released
Status in google-guest-agent package in Ubuntu:
  Confirmed
Status in netplan.io package in Ubuntu:
  New
Status in cloud-init source package in Bionic:
  Fix Committed
Status in google-guest-agent source package in Bionic:
  Confirmed
Status in netplan.io source package in Bionic:
  New
Status in cloud-init source package in Focal:
  Fix Committed
Status in google-guest-agent source package in Focal:
  Confirmed
Status in netplan.io source package in Focal:
  New
Status in cloud-init source package in Hirsute:
  Fix Committed
Status in google-guest-agent source package in Hirsute:
  Confirmed
Status in netplan.io source package in Hirsute:
  New
Status in cloud-init source package in Impish:
  Fix Released
Status in google-guest-agent source package in Impish:
  Confirmed
Status in netplan.io source package in Impish:
  New

Bug description:
  === Begin SRU Template ===
  [Impact]
  In PR #919 (81299de), we refactored some of the code used to bring up networks across distros. Previously, the call to bring up network interfaces during 'init' stage unintentionally resulted in a no-op such that network interfaces were NEVER brought up by cloud-init, even if new network interfaces were found after crawling the metadata.

  In #919, the code was altered to bring up these discovered network
  interfaces. On Ubuntu, this results in a 'netplan apply' call during
  'init' stage for any ubuntu-based distro on a datasource that has a
  NETWORK dependency. On GCE, this additional 'netplan apply' conflicts
  with the google-guest-agent service, resulting in an instance that can
  not be connected to.

  To fix this, we added a new 'disable_network_activation' option that
  can be enabled in /etc/cloud.cfg to disable the activation of network
  interfaces in 'init' stage.

  [Test Case]
  An integration test has been added at `tests/integration_tests/datasources/test_network_dependency.py` to test this functionality. To test manually:

  1. Launch an instance on GCE
  2. Install the cloud-init version with the fix
  3. Add a file, '/etc/cloud/cloud.cfg.d/99-disable-network-activation.cfg' with the contents:
  disable_network_activation: true

  4. Run cloud-init clean --logs
  5. Create a new image based on this instance
  6. Launch a new instance based on the new image
  7. Instance should launch successfully and able to be ssh'ed into
  8. "['netplan', 'apply']" should not be present anywhere in /var/log/cloud-init.log.
  9. "Bringing up newly configured network interfaces" should not exist anywhere in /var/log/cloud-init.log

  In the failure case, we will fail at step 7.

  [Regression Potential]
  The code in question determines whether to bring up interfaces after applying network config. Accidentally not doing this should not be a problem as we previously (unintentionally) did not bring these interfaces up. Accidentally bringing up interfaces when we shouldn't be also generally shouldn't cause a large problem outside of GCE, because outside of GCE there aren't (that we're aware of) other processes independently setting up network. If this setup determination code somehow fails, it happens early enough in boot that it could leave an instance unusable, however, the code is small enough and defensive enough that we don't believe that is a possibility.

  [Other Info]
  Github PR: https://github.com/canonical/cloud-init/pull/1048
  Upstream commit: <TODO>

  === End SRU Template ===
  Original bug report:

  Google Instances deployed with the Ubuntu 21.10 Daily images are
  inaccessible via SSH.

  gcloud compute instances create sf-impish-v20200720 --zone us-west1-a
  --network "default" --no-restart-on-failure --image-project ubuntu-os-
  cloud-devel --image daily-ubuntu-2110-impish-v20210720 --machine-type
  n1-standard-2

  Will result in a successful deploy yet, inaccessible via ssh from the
  end users configured laptop.

  This appears to affect all daily images after 20210719.

  daily-ubuntu-2110-impish-v20210719                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210720                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210721                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210723                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210724                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210725                    ubuntu-os-cloud-devel  ubuntu-2110                                   READY
  daily-ubuntu-2110-impish-v20210728                    ubuntu-os-cloud-devel  ubuntu-2110

  This problem also appears to be reproducible via the gcloud UI, create
  a new virtual machine using the daily-ubuntu-2110-impish-v20210720 or
  greater and instruct the virtual machine to import a ssh_pub_key in
  the security tab.  The Instance will start, yet still be inaccessible
  via the users private sshkey

  The google-guest-agent.service appears to be responsible for adding
  the google project ssh keys to the instance once its deployed. Please
  see below when queried on the 20210719 image:

   google-guest-agent.service - Google Compute Engine Guest Agent
       Loaded: loaded (/lib/systemd/system/google-guest-agent.service; enabled; vendor preset: enabled)
       Active: active (running) since Tue 2021-07-27 19:47:48 UTC; 18h ago
     Main PID: 711 (google_guest_ag)
        Tasks: 9 (limit: 8924)
       Memory: 19.7M
       CGroup: /system.slice/google-guest-agent.service
               └─711 /usr/bin/google_guest_agent

  Jul 27 19:47:55 sean-imp gpasswd[1469]: user google added by root to group floppy
  Jul 27 19:47:55 sean-imp gpasswd[1475]: user google added by root to group audio
  Jul 27 19:47:55 sean-imp gpasswd[1481]: user google added by root to group dip
  Jul 27 19:47:55 sean-imp gpasswd[1487]: user google added by root to group video
  Jul 27 19:47:55 sean-imp gpasswd[1493]: user google added by root to group plugdev
  Jul 27 19:47:55 sean-imp gpasswd[1499]: user google added by root to group netdev
  Jul 27 19:47:55 sean-imp gpasswd[1505]: user google added by root to group lxd
  Jul 27 19:47:55 sean-imp gpasswd[1511]: user google added by root to group google-sudoers
  Jul 27 19:47:55 sean-imp GCEGuestAgent[711]: 2021-07-27T19:47:55.1699Z GCEGuestAgent Info: Updating keys for user google.
  Jul 27 19:47:55 sean-imp google_guest_agent[711]: 2021/07/27 19:47:55 logging client: rpc error: code = PermissionDenied desc = Clo>
  lines 1-19/19 (END)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1938299/+subscriptions




More information about the foundations-bugs mailing list