[Bug 1969087] [NEW] failing to mount iSCSI path with first volume
DUFOUR Olivier
1969087 at bugs.launchpad.net
Thu Apr 14 09:14:02 UTC 2022
Public bug reported:
On a customer deployment, on focal-ussuri, with iSCSI backends and multipath enabled we face an issue where iscsiadm will fail to mount one the path of an iSCSI volume with the following error :
"iscsiadm: Could not make /etc/iscsi/nodes: File exists\niscsiadm: Error while adding record: encountered iSCSI database failure"
(see nova-compute.log for more details)
In term of impact more exactly, all the servers mounting an iSCSI volume
for the first time will fail "silently", as the end-user won't be aware,
to mount the first path of the iSCSI target.
I noticed the iscsi database error happens solely on the first iSCSI volume to be mounted on each involved server in the deployment like units running cinder-volume and nova-compute services.
After some investigations, this appears to be a race coundition with os-brick and iscsid daemon.
After a deployment or a reboot, iscsid isn't started and os-brick tries too quickly to mount the first path of the iSCSI volume before iscsid has finished to initialise thus leading to the error we see in cinder-volume or nova-compute logs.
If iscsid is manually started on the server before the error just simply disappears and the target paths are all mounted properly on the first volume.
Here is an example of the processes running on a nova-compute :
# before an instance creation with the first iSCSI volume
ubuntu at nova-compute:~$ ps aux | grep iscsi
ubuntu 3705821 0.0 0.0 6304 2624 pts/1 S+ 14:18 0:00 grep --color=auto iscsi
# after the instance creation
ubuntu at nova-compute:~$ ps aux | grep iscsi
root 3707866 0.0 0.0 5108 248 ? Ss 14:21 0:00 /sbin/iscsid
root 3707867 0.0 0.0 5964 5816 ? S<Ls 14:21 0:00 /sbin/iscsid
root 3707869 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_eh]
root 3707878 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_q_1]
ubuntu 3708321 0.0 0.0 6436 2524 pts/1 S+ 14:21 0:00 grep --color=auto iscsi
To avoid the first issue of iscsiadm encountering the database error,
the current workaround I found for now is simply to start and enable
iscsid on every cinder-volume and nova-compute units before mounting any
iSCSI volume.
Looking more in depth, this issue is also mentionned in this ticket on
os-brick #1944474 , where they have implemented a retry mecanism to try
again to mount the path if iscsiadm returns the database failure error
code (6).
Would it be possible either to backport the fix from #1944474 to the
package and/or to see if it's feasible to start iscsid beforehand
through a charm configuration ?
** Affects: python-os-brick (Ubuntu)
Importance: Undecided
Status: New
** Attachment added: "nova-compute.log"
https://bugs.launchpad.net/bugs/1969087/+attachment/5580635/+files/nova-compute.log
** Summary changed:
- os-brick failing to mount iSCSI path
+ failing to mount iSCSI path with first volume
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-os-brick in Ubuntu.
https://bugs.launchpad.net/bugs/1969087
Title:
failing to mount iSCSI path with first volume
Status in python-os-brick package in Ubuntu:
New
Bug description:
On a customer deployment, on focal-ussuri, with iSCSI backends and multipath enabled we face an issue where iscsiadm will fail to mount one the path of an iSCSI volume with the following error :
"iscsiadm: Could not make /etc/iscsi/nodes: File exists\niscsiadm: Error while adding record: encountered iSCSI database failure"
(see nova-compute.log for more details)
In term of impact more exactly, all the servers mounting an iSCSI
volume for the first time will fail "silently", as the end-user won't
be aware, to mount the first path of the iSCSI target.
I noticed the iscsi database error happens solely on the first iSCSI volume to be mounted on each involved server in the deployment like units running cinder-volume and nova-compute services.
After some investigations, this appears to be a race coundition with os-brick and iscsid daemon.
After a deployment or a reboot, iscsid isn't started and os-brick tries too quickly to mount the first path of the iSCSI volume before iscsid has finished to initialise thus leading to the error we see in cinder-volume or nova-compute logs.
If iscsid is manually started on the server before the error just simply disappears and the target paths are all mounted properly on the first volume.
Here is an example of the processes running on a nova-compute :
# before an instance creation with the first iSCSI volume
ubuntu at nova-compute:~$ ps aux | grep iscsi
ubuntu 3705821 0.0 0.0 6304 2624 pts/1 S+ 14:18 0:00 grep --color=auto iscsi
# after the instance creation
ubuntu at nova-compute:~$ ps aux | grep iscsi
root 3707866 0.0 0.0 5108 248 ? Ss 14:21 0:00 /sbin/iscsid
root 3707867 0.0 0.0 5964 5816 ? S<Ls 14:21 0:00 /sbin/iscsid
root 3707869 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_eh]
root 3707878 0.0 0.0 0 0 ? I< 14:21 0:00 [iscsi_q_1]
ubuntu 3708321 0.0 0.0 6436 2524 pts/1 S+ 14:21 0:00 grep --color=auto iscsi
To avoid the first issue of iscsiadm encountering the database error,
the current workaround I found for now is simply to start and enable
iscsid on every cinder-volume and nova-compute units before mounting
any iSCSI volume.
Looking more in depth, this issue is also mentionned in this ticket on
os-brick #1944474 , where they have implemented a retry mecanism to
try again to mount the path if iscsiadm returns the database failure
error code (6).
Would it be possible either to backport the fix from #1944474 to the
package and/or to see if it's feasible to start iscsid beforehand
through a charm configuration ?
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-os-brick/+bug/1969087/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list