[Bug 2143377] Re: Dangling RADOS export index entry in remove_export() crashes NFS-Ganesha
Seyeong Kim
2143377 at bugs.launchpad.net
Mon Mar 16 23:50:11 UTC 2026
** Patch added: "lp2143377_uca-caracal.debdiff"
https://bugs.launchpad.net/manila/+bug/2143377/+attachment/5953358/+files/lp2143377_uca-caracal.debdiff
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to manila in Ubuntu.
https://bugs.launchpad.net/bugs/2143377
Title:
Dangling RADOS export index entry in remove_export() crashes NFS-
Ganesha
Status in Ubuntu Cloud Archive:
In Progress
Status in Ubuntu Cloud Archive caracal series:
In Progress
Status in Ubuntu Cloud Archive dalmatian series:
In Progress
Status in Ubuntu Cloud Archive epoxy series:
In Progress
Status in Ubuntu Cloud Archive flamingo series:
In Progress
Status in Ubuntu Cloud Archive gazpacho series:
New
Status in Ubuntu Cloud Archive yoga series:
In Progress
Status in OpenStack Shared File Systems Service (Manila):
Fix Released
Status in manila package in Ubuntu:
New
Status in manila source package in Jammy:
In Progress
Status in manila source package in Noble:
In Progress
Status in manila source package in Questing:
In Progress
Status in manila source package in Resolute:
New
Bug description:
[Impact]
When manila deletes a CephFS NFS share, remove_export() deletes the
RADOS export object before removing its URL from the export index.
If manila-share is interrupted between these two operations, the index
retains a reference to a non-existent object. On the next NFS-Ganesha
restart, ganesha.nfsd hits ENOENT on the dangling reference and exits
FATAL. This takes down the entire NFS gateway — all connected NFS
clients get "server not responding" and all I/O hangs until manual
intervention.
[Test Case]
I tested this based on Juju/MAAS environment.
Prerequisites:
- Juju model with ceph-mon, ceph-osd, ceph-fs, mysql-innodb-cluster,
rabbitmq-server, keystone, manila, manila-ganesha
- OpenStack 2024.1/stable, Ceph Quincy
- NFS client node with nfs-common installed
Test 1: Dangling index crashes NFS-Ganesha (reproduce bug)
1. Create a fake RADOS export object in the manila-ganesha pool:
rados --id manila-ganesha -p manila-ganesha put \
ganesha-export-test-dangling /tmp/export_obj.conf
2. Add a %url entry pointing to it in ganesha-export-index:
echo '%url "rados://manila-ganesha/ganesha-export-test-dangling"' >> index
rados put ganesha-export-index index
3. Delete the object (simulating crash after _delete_rados_object):
rados rm ganesha-export-test-dangling
4. Restart NFS-Ganesha:
systemctl restart nfs-ganesha
5. Observe:
service exits FATAL with "Unknown error -2"
6. Clean up:
remove the dangling entry from index, restart ganesha.
Test 2: Orphan object is harmless (verify fix)
1. Create a RADOS export object but do NOT add it to the index
(simulating crash after _remove_rados_object_url_from_index
but before _delete_rados_object).
2. Restart NFS-Ganesha.
3. Observe:
service starts normally, orphan object is ignored.
Test 3: NFS client impact (dangling reference)
1. Create an NFS share via manila CLI:
manila type-create cephfsnfstype false \
--extra-specs share_backend_name=cephfsnfs1
manila create --share-type cephfsnfstype --name test-nfs NFS 1
2. Allow access and mount on a client node:
manila access-allow test-nfs ip <client_ip>
mount -t nfs <ganesha_ip>:<export_path> /mnt/test-nfs
3. Verify I/O: touch /mnt/test-nfs/testfile
4. Inject a dangling entry into the export index (same as Test 1).
5. Restart NFS-Ganesha — service crashes.
6. On client: observe "nfs: server <ip> not responding, timed out"
in kern.log; ls on mount point hangs.
7. Restore original index, restart ganesha — NFS I/O resumes.
Test 4: NFS client unaffected after fix
1. Apply the fix (swap two lines in remove_export() finally block).
2. Restart manila-share.
3. Create an NFS share, mount on client, verify I/O.
4. Drop orphan objects into the RADOS pool (no index entries).
5. Restart NFS-Ganesha.
6. Observe: ganesha starts normally, NFS I/O works, no
"not responding" in dmesg.
[Regression Potential]
Low. The change only reorders two independent cleanup operations in
the finally block of remove_export(). If _remove_rados_object_url_-
from_index() fails, the object deletion still proceeds as before.
The only new failure mode is an orphan RADOS object, which is harmless
(ganesha ignores objects not referenced in the index).
[Other Info]
Buggy order in manila/share/drivers/ganesha/manager.py remove_export():
self._delete_rados_object(...)
self._remove_rados_object_url_from_index(name)
Fixed order:
self._remove_rados_object_url_from_index(name)
self._delete_rados_object(...)
Reproduced on OpenStack 2024.1 (Caracal), Ceph Quincy 17.2.9,
manila-ganesha charm, NFS4 hard mount clients.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2143377/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list