[Bug 1987663] Re: cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup

Mauricio Faria de Oliveira 1987663 at bugs.launchpad.net
Wed Oct 18 18:03:16 UTC 2023


Real-world example:

1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
3) That delayed `tgt.service` too (it has `After=network.target`).

4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.

 $ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
   | grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'

 Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
 -- Subject: Unit cinder-volume.service has finished start-up
 --
 <<< ERROR in cinder-volume.log >>>

 Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
 -- Subject: Unit ovs-vswitchd.service has begun start-up
 --
 <<< DELAY of 2 minutes >>>

 Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
 -- Subject: Unit ovs-vswitchd.service has finished start-up
 --
 Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
 -- Subject: Unit network.target has finished start-up
 --
 Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
 -- Subject: Unit tgt.service has begun start-up
 --
 Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
 -- Subject: Unit tgt.service has finished start-up

 <<< NOW tgtd is running >>>

@ var/log/cinder/cinder-volume.log

  9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
 ...
  9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
  9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
  9934 Exit code: 107
  9935 Stdout: u''
  9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
  9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
  9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
  9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.driver.ensure_export(ctxt, volume)
  9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
  9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.target_driver.ensure_export(context, volume, volume_path)
  9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
  9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager old_name=None, **portals_config)
  9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
  9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
  9946 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 206, in call
  9947 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return attempt.get(self._wrap_exception)
  9948 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
  9949 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager six.reraise(self.value[0], self.value[1], self.value[2])
  9950 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
  9951 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  9952 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/tgt.py", line 141, in create_iscsi_target
  9953 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager run_as_root=True)
  9954 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 126, in execute
  9955 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs)
  9956 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 424, in execute
  9957 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager cmd=sanitized_cmd)
  9958 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
  9959 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
  9960 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Exit code: 107
  9961 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stdout: u''
  9962 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'

** Description changed:

  [Impact]
  
-  * The cinder-volume service might fail to re-export volumes
-    in-use on startup if tgt.service isn't fully started yet.
-    
-  * This affects the 'lvm' driver with 'tgtadm' target helper
-    (which runs 'tgtadm' commands that need the service ready).
-    
-  * Snippets from /var/log/cinder/cinder-volume.log:
-  
-    Failed to re-export volume, setting to ERROR.
-    ...
-    Command: tgtadm --lld iscsi --op show --mode target
-    ...
-    Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
-    Transport endpoint is not connected\n'
+  * The cinder-volume service might fail to re-export volumes
+    in-use on startup if tgt.service isn't fully started yet.
  
-  * This issue is more common in openstack compute nodes
-    with networking (ovs/ovn) that takes long to startup,
-    which might delay the startup of tgt.service _after_
-    cinder-volume.service.
+  * This affects the 'lvm' driver with 'tgtadm' target helper
+    (which runs 'tgtadm' commands that need the service ready).
+ 
+  * Snippets from /var/log/cinder/cinder-volume.log:
+ 
+    Failed to re-export volume, setting to ERROR.
+    ...
+    Command: tgtadm --lld iscsi --op show --mode target
+    ...
+    Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
+    Transport endpoint is not connected\n'
+ 
+  * This issue is more common in openstack compute nodes
+    with networking (ovs/ovn) that takes long to startup,
+    which might delay the startup of tgt.service _after_
+    cinder-volume.service.
  
  [Test Steps]
  
-  * Steps to reproduce are detailed in comment #2.
-    Summary:
-  
-  * Install mysql, rabbitmq-server, keystone, and cinder
-    (controller and storage nodes; backup node unneeded).
+  * Steps to reproduce are detailed in comment #3.
+    Summary:
  
-  * Configure cinder-volume (storage node) for LVM backend
-    and tgtadm iSCSI helper (tgt.service).
-  
-  * Create a cinder volume, and configure it as 'in-use'.
+  * Install mysql, rabbitmq-server, keystone, and cinder
+    (controller and storage nodes; backup node unneeded).
  
-  * Simulate a start delay on tgt.service with a drop-in.
+  * Configure cinder-volume (storage node) for LVM backend
+    and tgtadm iSCSI helper (tgt.service).
  
-  * Restart services: cinder-volume.service tgt.service
+  * Create a cinder volume, and configure it as 'in-use'.
  
-  * Check sequence of service startup.
-  
-  * Check status of the cinder volume: 
-    'in-use' (expected) or 'error' (bug).
-  
-  * Check log file /var/log/cinder/cinder-volume.log for
-    'tgtadm: failed to send request hdr to tgt daemon'.
+  * Simulate a start delay on tgt.service with a drop-in.
+ 
+  * Restart services: cinder-volume.service tgt.service
+ 
+  * Check sequence of service startup.
+ 
+  * Check status of the cinder volume:
+    'in-use' (expected) or 'error' (bug).
+ 
+  * Check log file /var/log/cinder/cinder-volume.log for
+    'tgtadm: failed to send request hdr to tgt daemon'.
  
  [Regression Potential]
  
-  * The fix introduces systemd unit 'After=' and 'Wants='
-    properties for tgt.service in cinder-volume.service,
-    thus might delay the boot process (multi-user.target).
+  * The fix introduces systemd unit 'After=' and 'Wants='
+    properties for tgt.service in cinder-volume.service,
+    thus might delay the boot process (multi-user.target).
  
-      $ systemctl show cinder-volume.service | grep WantedBy=
-      WantedBy=multi-user.target
-    
-  * However, the boot process already waits on tgt.service
-    anyway, thus the difference (if any) should not be big,
-    and would provide more correct behavior.
-    
-      $ systemctl show tgt.service | grep WantedBy=
-      WantedBy=multi-user.target
+      $ systemctl show cinder-volume.service | grep WantedBy=
+      WantedBy=multi-user.target
  
-  * If tgt.service is not present (tgt package not installed)
-    _no errors_ occur, as both 'After=' and 'Wants=' are weak
-    ordering/dependency properties (man 5 systemd.unit).
-    
+  * However, the boot process already waits on tgt.service
+    anyway, thus the difference (if any) should not be big,
+    and would provide more correct behavior.
+ 
+      $ systemctl show tgt.service | grep WantedBy=
+      WantedBy=multi-user.target
+ 
+  * If tgt.service is not present (tgt package not installed)
+    _no errors_ occur, as both 'After=' and 'Wants=' are weak
+    ordering/dependency properties (man 5 systemd.unit).
+ 
  [Other Info]
  
-  * The fix uses a systemd service drop-in snippet because
-    the service unit is generated by openstack-pkg-tools
-    (pkgos-gen-systemd-unit) based on the 'init' service,
-    and it only emits 'Wants=' for network-online.target.
-    
-  * Changing that in openstack-pkg-tools changes behavior
-    in stable releases, and only manifest at build time,
-    for many openstack packages that have no issues now.
-    
-  * We'll continue to pursue the general improvement in
-    Debian, so it comes into Ubuntu development release,
-    but for the Ubuntu stable releases, this should do.
+  * The fix uses a systemd service drop-in snippet because
+    the service unit is generated by openstack-pkg-tools
+    (pkgos-gen-systemd-unit) based on the 'init' service,
+    and it only emits 'Wants=' for network-online.target.
+ 
+  * Changing that in openstack-pkg-tools changes behavior
+    in stable releases, and only manifest at build time,
+    for many openstack packages that have no issues now.
+ 
+  * We'll continue to pursue the general improvement in
+    Debian, so it comes into Ubuntu development release,
+    but for the Ubuntu stable releases, this should do.
  
  [Original Bug Description]
  
- Real-world example:
- 
- 1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
- 2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
- 3) That delayed `tgt.service` too (it has `After=network.target`).
- 
- 4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
- 5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.
- 
-  $ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
-    | grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'
- 
-  Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
-  -- Subject: Unit cinder-volume.service has finished start-up
-  --
-  <<< ERROR in cinder-volume.log >>>
- 
-  Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
-  -- Subject: Unit ovs-vswitchd.service has begun start-up
-  --
-  <<< DELAY of 2 minutes >>>
- 
-  Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
-  -- Subject: Unit ovs-vswitchd.service has finished start-up
-  --
-  Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
-  -- Subject: Unit network.target has finished start-up
-  --
-  Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
-  -- Subject: Unit tgt.service has begun start-up
-  --
-  Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
-  -- Subject: Unit tgt.service has finished start-up
- 
-  <<< NOW tgtd is running >>>
- 
- @ var/log/cinder/cinder-volume.log
- 
-   9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
-  ...
-   9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
-   9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
-   9934 Exit code: 107
-   9935 Stdout: u''
-   9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
-   9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
-   9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
-   9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     self.driver.ensure_export(ctxt, volume)
-   9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
-   9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     self.target_driver.ensure_export(context, volume, volume_path)
-   9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
-   9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     old_name=None, **portals_config)
-   9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
-   9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     return r.call(f, *args, **kwargs)
-   9946 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/retrying.py", line 206, in call
-   9947 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     return attempt.get(self._wrap_exception)
-   9948 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
-   9949 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     six.reraise(self.value[0], self.value[1], self.value[2])
-   9950 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
-   9951 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
-   9952 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/tgt.py", line 141, in create_iscsi_target
-   9953 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     run_as_root=True)
-   9954 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 126, in execute
-   9955 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     return processutils.execute(*cmd, **kwargs)
-   9956 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 424, in execute
-   9957 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager     cmd=sanitized_cmd)
-   9958 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
-   9959 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
-   9960 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Exit code: 107
-   9961 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stdout: u''
-   9962 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
+ Real-world example in comment #2.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1987663

Title:
  cinder-volume: "Failed to re-export volume, setting to ERROR" with
  "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint
  is not connected" on service startup

Status in cinder package in Ubuntu:
  In Progress
Status in cinder package in Debian:
  Fix Released

Bug description:
  [Impact]

   * The cinder-volume service might fail to re-export volumes
     in-use on startup if tgt.service isn't fully started yet.

   * This affects the 'lvm' driver with 'tgtadm' target helper
     (which runs 'tgtadm' commands that need the service ready).

   * Snippets from /var/log/cinder/cinder-volume.log:

     Failed to re-export volume, setting to ERROR.
     ...
     Command: tgtadm --lld iscsi --op show --mode target
     ...
     Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
     Transport endpoint is not connected\n'

   * This issue is more common in openstack compute nodes
     with networking (ovs/ovn) that takes long to startup,
     which might delay the startup of tgt.service _after_
     cinder-volume.service.

  [Test Steps]

   * Steps to reproduce are detailed in comment #3.
     Summary:

   * Install mysql, rabbitmq-server, keystone, and cinder
     (controller and storage nodes; backup node unneeded).

   * Configure cinder-volume (storage node) for LVM backend
     and tgtadm iSCSI helper (tgt.service).

   * Create a cinder volume, and configure it as 'in-use'.

   * Simulate a start delay on tgt.service with a drop-in.

   * Restart services: cinder-volume.service tgt.service

   * Check sequence of service startup.

   * Check status of the cinder volume:
     'in-use' (expected) or 'error' (bug).

   * Check log file /var/log/cinder/cinder-volume.log for
     'tgtadm: failed to send request hdr to tgt daemon'.

  [Regression Potential]

   * The fix introduces systemd unit 'After=' and 'Wants='
     properties for tgt.service in cinder-volume.service,
     thus might delay the boot process (multi-user.target).

       $ systemctl show cinder-volume.service | grep WantedBy=
       WantedBy=multi-user.target

   * However, the boot process already waits on tgt.service
     anyway, thus the difference (if any) should not be big,
     and would provide more correct behavior.

       $ systemctl show tgt.service | grep WantedBy=
       WantedBy=multi-user.target

   * If tgt.service is not present (tgt package not installed)
     _no errors_ occur, as both 'After=' and 'Wants=' are weak
     ordering/dependency properties (man 5 systemd.unit).

  [Other Info]

   * The fix uses a systemd service drop-in snippet because
     the service unit is generated by openstack-pkg-tools
     (pkgos-gen-systemd-unit) based on the 'init' service,
     and it only emits 'Wants=' for network-online.target.

   * Changing that in openstack-pkg-tools changes behavior
     in stable releases, and only manifest at build time,
     for many openstack packages that have no issues now.

   * We'll continue to pursue the general improvement in
     Debian, so it comes into Ubuntu development release,
     but for the Ubuntu stable releases, this should do.

  [Original Bug Description]

  Real-world example in comment #2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cinder/+bug/1987663/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list