[Bug 1987663] Re: cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup
Mauricio Faria de Oliveira
1987663 at bugs.launchpad.net
Wed Oct 18 18:03:16 UTC 2023
Real-world example:
1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
3) That delayed `tgt.service` too (it has `After=network.target`).
4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.
$ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
| grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'
Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
-- Subject: Unit cinder-volume.service has finished start-up
--
<<< ERROR in cinder-volume.log >>>
Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
-- Subject: Unit ovs-vswitchd.service has begun start-up
--
<<< DELAY of 2 minutes >>>
Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
-- Subject: Unit ovs-vswitchd.service has finished start-up
--
Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
-- Subject: Unit network.target has finished start-up
--
Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
-- Subject: Unit tgt.service has begun start-up
--
Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
-- Subject: Unit tgt.service has finished start-up
<<< NOW tgtd is running >>>
@ var/log/cinder/cinder-volume.log
9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
...
9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
9934 Exit code: 107
9935 Stdout: u''
9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.driver.ensure_export(ctxt, volume)
9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.target_driver.ensure_export(context, volume, volume_path)
9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager old_name=None, **portals_config)
9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
9946 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 206, in call
9947 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return attempt.get(self._wrap_exception)
9948 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
9949 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager six.reraise(self.value[0], self.value[1], self.value[2])
9950 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
9951 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
9952 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/tgt.py", line 141, in create_iscsi_target
9953 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager run_as_root=True)
9954 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 126, in execute
9955 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs)
9956 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 424, in execute
9957 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager cmd=sanitized_cmd)
9958 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
9959 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
9960 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Exit code: 107
9961 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stdout: u''
9962 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
** Description changed:
[Impact]
- * The cinder-volume service might fail to re-export volumes
- in-use on startup if tgt.service isn't fully started yet.
-
- * This affects the 'lvm' driver with 'tgtadm' target helper
- (which runs 'tgtadm' commands that need the service ready).
-
- * Snippets from /var/log/cinder/cinder-volume.log:
-
- Failed to re-export volume, setting to ERROR.
- ...
- Command: tgtadm --lld iscsi --op show --mode target
- ...
- Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
- Transport endpoint is not connected\n'
+ * The cinder-volume service might fail to re-export volumes
+ in-use on startup if tgt.service isn't fully started yet.
- * This issue is more common in openstack compute nodes
- with networking (ovs/ovn) that takes long to startup,
- which might delay the startup of tgt.service _after_
- cinder-volume.service.
+ * This affects the 'lvm' driver with 'tgtadm' target helper
+ (which runs 'tgtadm' commands that need the service ready).
+
+ * Snippets from /var/log/cinder/cinder-volume.log:
+
+ Failed to re-export volume, setting to ERROR.
+ ...
+ Command: tgtadm --lld iscsi --op show --mode target
+ ...
+ Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
+ Transport endpoint is not connected\n'
+
+ * This issue is more common in openstack compute nodes
+ with networking (ovs/ovn) that takes long to startup,
+ which might delay the startup of tgt.service _after_
+ cinder-volume.service.
[Test Steps]
- * Steps to reproduce are detailed in comment #2.
- Summary:
-
- * Install mysql, rabbitmq-server, keystone, and cinder
- (controller and storage nodes; backup node unneeded).
+ * Steps to reproduce are detailed in comment #3.
+ Summary:
- * Configure cinder-volume (storage node) for LVM backend
- and tgtadm iSCSI helper (tgt.service).
-
- * Create a cinder volume, and configure it as 'in-use'.
+ * Install mysql, rabbitmq-server, keystone, and cinder
+ (controller and storage nodes; backup node unneeded).
- * Simulate a start delay on tgt.service with a drop-in.
+ * Configure cinder-volume (storage node) for LVM backend
+ and tgtadm iSCSI helper (tgt.service).
- * Restart services: cinder-volume.service tgt.service
+ * Create a cinder volume, and configure it as 'in-use'.
- * Check sequence of service startup.
-
- * Check status of the cinder volume:
- 'in-use' (expected) or 'error' (bug).
-
- * Check log file /var/log/cinder/cinder-volume.log for
- 'tgtadm: failed to send request hdr to tgt daemon'.
+ * Simulate a start delay on tgt.service with a drop-in.
+
+ * Restart services: cinder-volume.service tgt.service
+
+ * Check sequence of service startup.
+
+ * Check status of the cinder volume:
+ 'in-use' (expected) or 'error' (bug).
+
+ * Check log file /var/log/cinder/cinder-volume.log for
+ 'tgtadm: failed to send request hdr to tgt daemon'.
[Regression Potential]
- * The fix introduces systemd unit 'After=' and 'Wants='
- properties for tgt.service in cinder-volume.service,
- thus might delay the boot process (multi-user.target).
+ * The fix introduces systemd unit 'After=' and 'Wants='
+ properties for tgt.service in cinder-volume.service,
+ thus might delay the boot process (multi-user.target).
- $ systemctl show cinder-volume.service | grep WantedBy=
- WantedBy=multi-user.target
-
- * However, the boot process already waits on tgt.service
- anyway, thus the difference (if any) should not be big,
- and would provide more correct behavior.
-
- $ systemctl show tgt.service | grep WantedBy=
- WantedBy=multi-user.target
+ $ systemctl show cinder-volume.service | grep WantedBy=
+ WantedBy=multi-user.target
- * If tgt.service is not present (tgt package not installed)
- _no errors_ occur, as both 'After=' and 'Wants=' are weak
- ordering/dependency properties (man 5 systemd.unit).
-
+ * However, the boot process already waits on tgt.service
+ anyway, thus the difference (if any) should not be big,
+ and would provide more correct behavior.
+
+ $ systemctl show tgt.service | grep WantedBy=
+ WantedBy=multi-user.target
+
+ * If tgt.service is not present (tgt package not installed)
+ _no errors_ occur, as both 'After=' and 'Wants=' are weak
+ ordering/dependency properties (man 5 systemd.unit).
+
[Other Info]
- * The fix uses a systemd service drop-in snippet because
- the service unit is generated by openstack-pkg-tools
- (pkgos-gen-systemd-unit) based on the 'init' service,
- and it only emits 'Wants=' for network-online.target.
-
- * Changing that in openstack-pkg-tools changes behavior
- in stable releases, and only manifest at build time,
- for many openstack packages that have no issues now.
-
- * We'll continue to pursue the general improvement in
- Debian, so it comes into Ubuntu development release,
- but for the Ubuntu stable releases, this should do.
+ * The fix uses a systemd service drop-in snippet because
+ the service unit is generated by openstack-pkg-tools
+ (pkgos-gen-systemd-unit) based on the 'init' service,
+ and it only emits 'Wants=' for network-online.target.
+
+ * Changing that in openstack-pkg-tools changes behavior
+ in stable releases, and only manifest at build time,
+ for many openstack packages that have no issues now.
+
+ * We'll continue to pursue the general improvement in
+ Debian, so it comes into Ubuntu development release,
+ but for the Ubuntu stable releases, this should do.
[Original Bug Description]
- Real-world example:
-
- 1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
- 2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
- 3) That delayed `tgt.service` too (it has `After=network.target`).
-
- 4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
- 5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.
-
- $ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
- | grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'
-
- Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
- -- Subject: Unit cinder-volume.service has finished start-up
- --
- <<< ERROR in cinder-volume.log >>>
-
- Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
- -- Subject: Unit ovs-vswitchd.service has begun start-up
- --
- <<< DELAY of 2 minutes >>>
-
- Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
- -- Subject: Unit ovs-vswitchd.service has finished start-up
- --
- Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
- -- Subject: Unit network.target has finished start-up
- --
- Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
- -- Subject: Unit tgt.service has begun start-up
- --
- Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
- -- Subject: Unit tgt.service has finished start-up
-
- <<< NOW tgtd is running >>>
-
- @ var/log/cinder/cinder-volume.log
-
- 9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
- ...
- 9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
- 9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
- 9934 Exit code: 107
- 9935 Stdout: u''
- 9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
- 9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
- 9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
- 9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.driver.ensure_export(ctxt, volume)
- 9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
- 9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.target_driver.ensure_export(context, volume, volume_path)
- 9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
- 9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager old_name=None, **portals_config)
- 9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
- 9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
- 9946 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 206, in call
- 9947 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return attempt.get(self._wrap_exception)
- 9948 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
- 9949 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager six.reraise(self.value[0], self.value[1], self.value[2])
- 9950 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
- 9951 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
- 9952 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/tgt.py", line 141, in create_iscsi_target
- 9953 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager run_as_root=True)
- 9954 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 126, in execute
- 9955 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs)
- 9956 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 424, in execute
- 9957 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager cmd=sanitized_cmd)
- 9958 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
- 9959 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
- 9960 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Exit code: 107
- 9961 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stdout: u''
- 9962 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
+ Real-world example in comment #2.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1987663
Title:
cinder-volume: "Failed to re-export volume, setting to ERROR" with
"tgtadm: failed to send request hdr to tgt daemon, Transport endpoint
is not connected" on service startup
Status in cinder package in Ubuntu:
In Progress
Status in cinder package in Debian:
Fix Released
Bug description:
[Impact]
* The cinder-volume service might fail to re-export volumes
in-use on startup if tgt.service isn't fully started yet.
* This affects the 'lvm' driver with 'tgtadm' target helper
(which runs 'tgtadm' commands that need the service ready).
* Snippets from /var/log/cinder/cinder-volume.log:
Failed to re-export volume, setting to ERROR.
...
Command: tgtadm --lld iscsi --op show --mode target
...
Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
Transport endpoint is not connected\n'
* This issue is more common in openstack compute nodes
with networking (ovs/ovn) that takes long to startup,
which might delay the startup of tgt.service _after_
cinder-volume.service.
[Test Steps]
* Steps to reproduce are detailed in comment #3.
Summary:
* Install mysql, rabbitmq-server, keystone, and cinder
(controller and storage nodes; backup node unneeded).
* Configure cinder-volume (storage node) for LVM backend
and tgtadm iSCSI helper (tgt.service).
* Create a cinder volume, and configure it as 'in-use'.
* Simulate a start delay on tgt.service with a drop-in.
* Restart services: cinder-volume.service tgt.service
* Check sequence of service startup.
* Check status of the cinder volume:
'in-use' (expected) or 'error' (bug).
* Check log file /var/log/cinder/cinder-volume.log for
'tgtadm: failed to send request hdr to tgt daemon'.
[Regression Potential]
* The fix introduces systemd unit 'After=' and 'Wants='
properties for tgt.service in cinder-volume.service,
thus might delay the boot process (multi-user.target).
$ systemctl show cinder-volume.service | grep WantedBy=
WantedBy=multi-user.target
* However, the boot process already waits on tgt.service
anyway, thus the difference (if any) should not be big,
and would provide more correct behavior.
$ systemctl show tgt.service | grep WantedBy=
WantedBy=multi-user.target
* If tgt.service is not present (tgt package not installed)
_no errors_ occur, as both 'After=' and 'Wants=' are weak
ordering/dependency properties (man 5 systemd.unit).
[Other Info]
* The fix uses a systemd service drop-in snippet because
the service unit is generated by openstack-pkg-tools
(pkgos-gen-systemd-unit) based on the 'init' service,
and it only emits 'Wants=' for network-online.target.
* Changing that in openstack-pkg-tools changes behavior
in stable releases, and only manifest at build time,
for many openstack packages that have no issues now.
* We'll continue to pursue the general improvement in
Debian, so it comes into Ubuntu development release,
but for the Ubuntu stable releases, this should do.
[Original Bug Description]
Real-world example in comment #2.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cinder/+bug/1987663/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list