[Bug 1987663] Re: cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup

Mauricio Faria de Oliveira 1987663 at bugs.launchpad.net
Wed Oct 18 18:08:42 UTC 2023


Copy/paste of commands to check before/after:

... Reproducer (15-second startup delay in tgt.service)

	FILE=/etc/systemd/system/tgt.service.d/start-delay.conf
	mkdir -p $(dirname $FILE)
	cat <<EOF > $FILE
	[Service]
	ExecStartPre=$(which sleep) 15
	EOF
	systemctl daemon-reload

Before:
---

... Restart both services at the same time:

	# date; systemctl restart cinder-volume.service tgt.service; date
	Mon Oct 16 21:57:12 UTC 2023
	Mon Oct 16 21:57:27 UTC 2023

... Notice that cinder-volume.service is Started _BEFORE_ tgt.service

	# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -3
	Oct 16 21:57:12 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.
	Oct 16 21:57:12 cinder-mantic systemd[1]: Starting tgt.service - (i)SCSI target daemon...
	Oct 16 21:57:27 cinder-mantic systemd[1]: Started tgt.service - (i)SCSI target daemon.

.. Log error:

	# grep 'ERROR cinder.volume.manager' /var/log/cinder/cinder-volume.log
	...
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager [None req-ce03264a-6765-41de-8016-a6f27d2685e4 - - - - - -] Failed to re-export volume, setting to ERROR.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
	...
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Command: tgtadm --lld iscsi --op show --mode target
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Exit code: 107
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stdout: ''
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stderr: 'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
	2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager

... Volume error:

	# cinder list
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| ID                                   | Status | Name        | Size | Consumes Quota | Volume Type | Bootable | Attached to |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| 17c4f736-058b-4cfc-9864-a64ab0995957 | error  | test-volume | 1    | True           | __DEFAULT__ | false    |             |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+

... Undo error state:

	# systemctl restart cinder-volume.service
	# cinder reset-state --state in-use test-volume
	# cinder list

	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| ID                                   | Status | Name        | Size | Consumes Quota | Volume Type | Bootable | Attached to |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1    | True           | __DEFAULT__ | false    |             |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+


After:
---

... Patched with Wants=

	# systemctl show cinder-volume.service | grep Wants=
	Wants=network-online.target tgt.service

	# cinder list
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| ID                                   | Status | Name        | Size | Consumes Quota | Volume Type | Bootable | Attached to |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1    | True           | __DEFAULT__ | false    |             |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+

... Restart both services at the same time:

	# date; systemctl restart cinder-volume.service tgt.service; date
	Wed Oct 18 15:29:01 UTC 2023
	Wed Oct 18 15:29:17 UTC 2023

... Notice that cinder-volume.service is Started _AFTER_ tgt.service

	# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -3
	Oct 18 15:29:01 cinder-mantic systemd[1]: Starting tgt.service - (i)SCSI target daemon...
	Oct 18 15:29:17 cinder-mantic systemd[1]: Started tgt.service - (i)SCSI target daemon.
	Oct 18 15:29:17 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.

... Volume not in error:

	# cinder list
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| ID                                   | Status | Name        | Size | Consumes Quota | Volume Type | Bootable | Attached to |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
	| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1    | True           | __DEFAULT__ | false    |             |
	+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+

... Without tgt.service installed:

        # apt remove --purge --yes tgt

	# systemctl status tgt.service
	Unit tgt.service could not be found.

	# systemctl show cinder-volume.service | grep Wants=
	Wants=tgt.service network-online.target

	# systemctl restart cinder-volume.service
	# echo $?
	0

	# systemctl status cinder-volume.service | grep Active:
	     Active: active (running) since Wed 2023-10-18 16:47:14 UTC; 11s ago

	# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -1
	Oct 18 16:47:14 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1987663

Title:
  cinder-volume: "Failed to re-export volume, setting to ERROR" with
  "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint
  is not connected" on service startup

Status in cinder package in Ubuntu:
  In Progress
Status in cinder package in Debian:
  Fix Released

Bug description:
  [Impact]

   * The cinder-volume service might fail to re-export volumes
     in-use on startup if tgt.service isn't fully started yet.

   * This affects the 'lvm' driver with 'tgtadm' target helper
     (which runs 'tgtadm' commands that need the service ready).

   * Snippets from /var/log/cinder/cinder-volume.log:

     Failed to re-export volume, setting to ERROR.
     ...
     Command: tgtadm --lld iscsi --op show --mode target
     ...
     Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
     Transport endpoint is not connected\n'

   * This issue is more common in openstack compute nodes
     with networking (ovs/ovn) that takes long to startup,
     which might delay the startup of tgt.service _after_
     cinder-volume.service.

  [Test Steps]

   * Steps to reproduce are detailed in comment #3.
     Summary:

   * Install mysql, rabbitmq-server, keystone, and cinder
     (controller and storage nodes; backup node unneeded).

   * Configure cinder-volume (storage node) for LVM backend
     and tgtadm iSCSI helper (tgt.service).

   * Create a cinder volume, and configure it as 'in-use'.

   * Simulate a start delay on tgt.service with a drop-in.

   * Restart services: cinder-volume.service tgt.service

   * Check sequence of service startup.

   * Check status of the cinder volume:
     'in-use' (expected) or 'error' (bug).

   * Check log file /var/log/cinder/cinder-volume.log for
     'tgtadm: failed to send request hdr to tgt daemon'.

  [Regression Potential]

   * The fix introduces systemd unit 'After=' and 'Wants='
     properties for tgt.service in cinder-volume.service,
     thus might delay the boot process (multi-user.target).

       $ systemctl show cinder-volume.service | grep WantedBy=
       WantedBy=multi-user.target

   * However, the boot process already waits on tgt.service
     anyway, thus the difference (if any) should not be big,
     and would provide more correct behavior.

       $ systemctl show tgt.service | grep WantedBy=
       WantedBy=multi-user.target

   * If tgt.service is not present (tgt package not installed)
     _no errors_ occur, as both 'After=' and 'Wants=' are weak
     ordering/dependency properties (man 5 systemd.unit).

  [Other Info]

   * The fix uses a systemd service drop-in snippet because
     the service unit is generated by openstack-pkg-tools
     (pkgos-gen-systemd-unit) based on the 'init' service,
     and it only emits 'Wants=' for network-online.target.

   * Changing that in openstack-pkg-tools changes behavior
     in stable releases, and only manifest at build time,
     for many openstack packages that have no issues now.

   * We'll continue to pursue the general improvement in
     Debian, so it comes into Ubuntu development release,
     but for the Ubuntu stable releases, this should do.

  [Original Bug Description]

  Real-world example in comment #2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cinder/+bug/1987663/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list