[Bug 1987663] Re: cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup
Mauricio Faria de Oliveira
1987663 at bugs.launchpad.net
Wed Oct 18 18:04:20 UTC 2023
Steps to Reproduce (Mantic)
# LXD VM
$ lxc launch --vm --config limits.cpu=2 --config limits.memory=3GiB ubuntu:mantic cinder-mantic
$ lxc shell cinder-mantic
# Apt
apt remove -y unattended-upgrades
apt update
# Network
echo '127.0.0.1 controller' >>/etc/hosts
# MySQL
apt install -y mysql-server python3-pymysql
# RabbitMQ
apt install -y rabbitmq-server
rabbitmqctl add_user openstack RABBIT_PASS
rabbitmqctl set_permissions openstack ".*" ".*" ".*"
# Keystone
cat <<EOF | mysql
CREATE DATABASE keystone;
CREATE USER 'keystone'@'localhost' IDENTIFIED BY 'KEYSTONE_DBPASS';
GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost';
EOF
apt install -y keystone
@ /etc/keystone/keystone.conf
[database]
connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone
[token]
provider = fernet
...
# mantic has some exceptions ignored in these, but worked fine:
su -s /bin/sh -c "keystone-manage db_sync" keystone # exit code 0
keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
keystone-manage credential_setup --keystone-user keystone --keystone-group keystone
keystone-manage bootstrap --bootstrap-password ADMIN_PASS \
--bootstrap-admin-url http://controller:5000/v3/ \
--bootstrap-internal-url http://controller:5000/v3/ \
--bootstrap-public-url http://controller:5000/v3/ \
--bootstrap-region-id RegionOne
echo 'ServerName controller' >>/etc/apache2/apache2.conf
systemctl restart apache2.service
# Openstack Client
cat <<EOF >openstack.rc
export OS_USERNAME=admin
export OS_PASSWORD=ADMIN_PASS
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
EOF
source openstack.rc
apt install -y python3-openstackclient
openstack project create --domain default \
--description "Service Project" service
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Service Project |
| domain_id | default |
| enabled | True |
| id | 229ef5671d0b4136b32f1a60584ab725 |
| is_domain | False |
| name | service |
| options | {} |
| parent_id | default |
| tags | [] |
+-------------+----------------------------------+
# openstack project list
+----------------------------------+---------+
| ID | Name |
+----------------------------------+---------+
| 0e204837047a4323b8f57ed23bffa4f8 | admin |
| 229ef5671d0b4136b32f1a60584ab725 | service |
+----------------------------------+---------+
# Cinder
cat <<EOF | mysql
CREATE DATABASE cinder;
CREATE USER 'cinder'@'localhost' IDENTIFIED BY 'CINDER_DBPASS';
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost';
EOF
openstack user create --domain default --password CINDER_USERPASS cinder
openstack role add --project service --user cinder admin
openstack service create --name cinderv2 \
--description "OpenStack Block Storage" volumev2
openstack service create --name cinderv3 \
--description "OpenStack Block Storage" volumev3
openstack endpoint create --region RegionOne \
volumev2 public http://controller:8776/v2/%\(project_id\)s
openstack endpoint create --region RegionOne \
volumev2 internal http://controller:8776/v2/%\(project_id\)s
openstack endpoint create --region RegionOne \
volumev2 admin http://controller:8776/v2/%\(project_id\)s
openstack endpoint create --region RegionOne \
volumev3 public http://controller:8776/v3/%\(project_id\)s
openstack endpoint create --region RegionOne \
volumev3 internal http://controller:8776/v3/%\(project_id\)s
openstack endpoint create --region RegionOne \
volumev3 admin http://controller:8776/v3/%\(project_id\)s
apt install -y cinder-api cinder-scheduler
@ /etc/cinder/cinder.conf
[DEFAULT]
my_ip = 127.0.0.1
transport_url = rabbit://openstack:RABBIT_PASS@controller
auth_strategy = keystone
[database]
connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = cinder
password = CINDER_USERPASS
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
...
su -s /bin/sh -c "cinder-manage db sync" cinder
systemctl restart cinder-scheduler.service apache2.service
# Cinder Volume
truncate -s 5GiB /test.img
mknod --mode 0660 /dev/loop-cinder b 7 42
losetup /dev/loop-cinder /test.img
pvcreate /dev/loop-cinder
vgcreate cinder-volumes /dev/loop-cinder
apt install -y cinder-volume tgt lvm2 thin-provisioning-tools
@ /etc/cinder/cinder.conf
[DEFAULT]
enabled_backends = lvm
[lvm]
lvm_type = default
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
target_protocol = iscsi
target_helper = tgtadm
...
echo 'include /var/lib/cinder/volumes/*' >/etc/tgt/conf.d/cinder.conf
systemctl restart tgt.service cinder-volume.service
# Cinder Client
apt install -y python3-cinderclient
# cinder list
+----+--------+------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+----+--------+------+------+----------------+-------------+----------+-------------+
+----+--------+------+------+----------------+-------------+----------+-------------+
cinder create --name test-volume 1
# cinder list
+--------------------------------------+-----------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | available | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+-----------+-------------+------+----------------+-------------+----------+-------------+
# mantic requires attachment to set as in-use
cinder attachment-create test-volume
# cinder attachment-list
+--------------------------------------+--------------------------------------+----------+-----------+
| ID | Volume ID | Status | Server ID |
+--------------------------------------+--------------------------------------+----------+-----------+
| 90704b06-cf65-473c-adef-7e8c8b5c8b2c | 17c4f736-058b-4cfc-9864-a64ab0995957 | reserved | - |
+--------------------------------------+--------------------------------------+----------+-----------+
cinder reset-state --state in-use test-volume
# cinder list
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
# Reproducer (15-second startup delay in tgt.service)
FILE=/etc/systemd/system/tgt.service.d/start-delay.conf
mkdir -p $(dirname $FILE)
cat <<EOF > $FILE
[Service]
ExecStartPre=$(which sleep) 15
EOF
systemctl daemon-reload
# date; systemctl restart cinder-volume.service tgt.service; date
Mon Oct 16 21:57:12 UTC 2023
Mon Oct 16 21:57:27 UTC 2023
... Notice that cinder-volume.service is Started _before_ tgt.service
# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -3
Oct 16 21:57:12 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.
Oct 16 21:57:12 cinder-mantic systemd[1]: Starting tgt.service - (i)SCSI target daemon...
Oct 16 21:57:27 cinder-mantic systemd[1]: Started tgt.service - (i)SCSI target daemon.
# grep 'ERROR cinder.volume.manager' /var/log/cinder/cinder-volume.log
...
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager [None req-ce03264a-6765-41de-8016-a6f27d2685e4 - - - - - -] Failed to re-export volume, setting to ERROR.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
...
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Command: tgtadm --lld iscsi --op show --mode target
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Exit code: 107
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stdout: ''
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager Stderr: 'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
2023-10-16 21:57:18.301 1658 ERROR cinder.volume.manager
# cinder list
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | error | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
# Undo error state
# systemctl restart cinder-volume.service
# cinder reset-state --state in-use test-volume
# cinder list
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
Patched:
# systemctl show cinder-volume.service | grep Wants=
Wants=network-online.target tgt.service
# cinder list
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
# date; systemctl restart cinder-volume.service tgt.service; date
Wed Oct 18 15:29:01 UTC 2023
Wed Oct 18 15:29:17 UTC 2023
... Notice that cinder-volume.service is Started _after_ tgt.service
# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -3
Oct 18 15:29:01 cinder-mantic systemd[1]: Starting tgt.service - (i)SCSI target daemon...
Oct 18 15:29:17 cinder-mantic systemd[1]: Started tgt.service - (i)SCSI target daemon.
Oct 18 15:29:17 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.
# cinder list
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| ID | Status | Name | Size | Consumes Quota | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
| 17c4f736-058b-4cfc-9864-a64ab0995957 | in-use | test-volume | 1 | True | __DEFAULT__ | false | |
+--------------------------------------+--------+-------------+------+----------------+-------------+----------+-------------+
... Without tgt.service installed:
# apt remove --purge --yes tgt
# systemctl status tgt.service
Unit tgt.service could not be found.
# systemctl show cinder-volume.service | grep Wants=
Wants=tgt.service network-online.target
# systemctl restart cinder-volume.service
# echo $?
0
# systemctl status cinder-volume.service | grep Active:
Active: active (running) since Wed 2023-10-18 16:47:14 UTC; 11s ago
# journalctl -b -u cinder-volume.service -u tgt.service | grep Start | tail -1
Oct 18 16:47:14 cinder-mantic systemd[1]: Started cinder-volume.service - OpenStack Cinder Volume.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1987663
Title:
cinder-volume: "Failed to re-export volume, setting to ERROR" with
"tgtadm: failed to send request hdr to tgt daemon, Transport endpoint
is not connected" on service startup
Status in cinder package in Ubuntu:
In Progress
Status in cinder package in Debian:
Fix Released
Bug description:
[Impact]
* The cinder-volume service might fail to re-export volumes
in-use on startup if tgt.service isn't fully started yet.
* This affects the 'lvm' driver with 'tgtadm' target helper
(which runs 'tgtadm' commands that need the service ready).
* Snippets from /var/log/cinder/cinder-volume.log:
Failed to re-export volume, setting to ERROR.
...
Command: tgtadm --lld iscsi --op show --mode target
...
Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
Transport endpoint is not connected\n'
* This issue is more common in openstack compute nodes
with networking (ovs/ovn) that takes long to startup,
which might delay the startup of tgt.service _after_
cinder-volume.service.
[Test Steps]
* Steps to reproduce are detailed in comment #3.
Summary:
* Install mysql, rabbitmq-server, keystone, and cinder
(controller and storage nodes; backup node unneeded).
* Configure cinder-volume (storage node) for LVM backend
and tgtadm iSCSI helper (tgt.service).
* Create a cinder volume, and configure it as 'in-use'.
* Simulate a start delay on tgt.service with a drop-in.
* Restart services: cinder-volume.service tgt.service
* Check sequence of service startup.
* Check status of the cinder volume:
'in-use' (expected) or 'error' (bug).
* Check log file /var/log/cinder/cinder-volume.log for
'tgtadm: failed to send request hdr to tgt daemon'.
[Regression Potential]
* The fix introduces systemd unit 'After=' and 'Wants='
properties for tgt.service in cinder-volume.service,
thus might delay the boot process (multi-user.target).
$ systemctl show cinder-volume.service | grep WantedBy=
WantedBy=multi-user.target
* However, the boot process already waits on tgt.service
anyway, thus the difference (if any) should not be big,
and would provide more correct behavior.
$ systemctl show tgt.service | grep WantedBy=
WantedBy=multi-user.target
* If tgt.service is not present (tgt package not installed)
_no errors_ occur, as both 'After=' and 'Wants=' are weak
ordering/dependency properties (man 5 systemd.unit).
[Other Info]
* The fix uses a systemd service drop-in snippet because
the service unit is generated by openstack-pkg-tools
(pkgos-gen-systemd-unit) based on the 'init' service,
and it only emits 'Wants=' for network-online.target.
* Changing that in openstack-pkg-tools changes behavior
in stable releases, and only manifest at build time,
for many openstack packages that have no issues now.
* We'll continue to pursue the general improvement in
Debian, so it comes into Ubuntu development release,
but for the Ubuntu stable releases, this should do.
[Original Bug Description]
Real-world example in comment #2.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cinder/+bug/1987663/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list