[Bug 2143377] Fix merged to manila (stable/2024.2)

OpenStack Infra 2143377 at bugs.launchpad.net
Mon Mar 16 19:45:27 UTC 2026


Reviewed:  https://review.opendev.org/c/openstack/manila/+/980250
Committed: https://opendev.org/openstack/manila/commit/ab3511c546188bb77f2f94636156fe03e72df08c
Submitter: "Zuul (22348)"
Branch:    stable/2024.2

commit ab3511c546188bb77f2f94636156fe03e72df08c
Author: Seyeong Kim <seyeong.kim at canonical.com>
Date:   Fri Mar 6 03:28:00 2026 +0000

    Fix RADOS export index dangling reference in remove_export()
    
    In remove_export(), the finally block deletes the RADOS object before
    removing its URL from ganesha-export-index. If the process crashes
    between these two steps, the index retains a reference to a
    non-existent object. On the next nfs-ganesha restart, the daemon
    reads the index, fails to fetch the missing object (ENOENT), and
    exits with FATAL -- taking down all NFS client connections.
    
    Swap the two operations so the index entry is removed first. If a
    crash now occurs between the two steps, only an orphan RADOS object
    remains, which nfs-ganesha safely ignores on startup.
    
    Closes-Bug: #2143377
    Change-Id: I8f3a2c1e5d9b4a7f6e0c3d2b1a9f8e7d6c5b4a3f
    Signed-off-by: Seyeong Kim <seyeong.kim at canonical.com>
    (cherry picked from commit d60b2a72a2bc1326c55c4bbdcf71408d4c3641d2)
    (cherry picked from commit 575d7b17be10736ab033c741b72d49d655a7f105)
    (cherry picked from commit b381d44623e2135978576c7df5979fcb397e05de)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to manila in Ubuntu.
https://bugs.launchpad.net/bugs/2143377

Title:
  Dangling RADOS export index entry in remove_export() crashes NFS-
  Ganesha

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive caracal series:
  In Progress
Status in Ubuntu Cloud Archive dalmatian series:
  In Progress
Status in Ubuntu Cloud Archive epoxy series:
  In Progress
Status in Ubuntu Cloud Archive flamingo series:
  In Progress
Status in Ubuntu Cloud Archive gazpacho series:
  New
Status in Ubuntu Cloud Archive yoga series:
  In Progress
Status in OpenStack Shared File Systems Service (Manila):
  Fix Released
Status in manila package in Ubuntu:
  New
Status in manila source package in Jammy:
  In Progress
Status in manila source package in Noble:
  In Progress
Status in manila source package in Questing:
  In Progress
Status in manila source package in Resolute:
  New

Bug description:
  [Impact]

  When manila deletes a CephFS NFS share, remove_export() deletes the
  RADOS export object before removing its URL from the export index.
  If manila-share is interrupted between these two operations, the index
  retains a reference to a non-existent object. On the next NFS-Ganesha
  restart, ganesha.nfsd hits ENOENT on the dangling reference and exits
  FATAL. This takes down the entire NFS gateway — all connected NFS
  clients get "server not responding" and all I/O hangs until manual
  intervention.

  [Test Case]

  I tested this based on Juju/MAAS environment.

  Prerequisites:
    - Juju model with ceph-mon, ceph-osd, ceph-fs, mysql-innodb-cluster,
      rabbitmq-server, keystone, manila, manila-ganesha
    - OpenStack 2024.1/stable, Ceph Quincy
    - NFS client node with nfs-common installed

  Test 1: Dangling index crashes NFS-Ganesha (reproduce bug)
    1. Create a fake RADOS export object in the manila-ganesha pool:
         rados --id manila-ganesha -p manila-ganesha put \
           ganesha-export-test-dangling /tmp/export_obj.conf
    2. Add a %url entry pointing to it in ganesha-export-index:
         echo '%url "rados://manila-ganesha/ganesha-export-test-dangling"' >> index
         rados put ganesha-export-index index
    3. Delete the object (simulating crash after _delete_rados_object):
         rados rm ganesha-export-test-dangling
    4. Restart NFS-Ganesha:
         systemctl restart nfs-ganesha
    5. Observe: 
         service exits FATAL with "Unknown error -2"
    6. Clean up: 
         remove the dangling entry from index, restart ganesha.

  Test 2: Orphan object is harmless (verify fix)
    1. Create a RADOS export object but do NOT add it to the index
       (simulating crash after _remove_rados_object_url_from_index
       but before _delete_rados_object).
    2. Restart NFS-Ganesha.
    3. Observe: 
         service starts normally, orphan object is ignored.

  Test 3: NFS client impact (dangling reference)
    1. Create an NFS share via manila CLI:
         manila type-create cephfsnfstype false \
           --extra-specs share_backend_name=cephfsnfs1
         manila create --share-type cephfsnfstype --name test-nfs NFS 1
    2. Allow access and mount on a client node:
         manila access-allow test-nfs ip <client_ip>
         mount -t nfs <ganesha_ip>:<export_path> /mnt/test-nfs
    3. Verify I/O: touch /mnt/test-nfs/testfile
    4. Inject a dangling entry into the export index (same as Test 1).
    5. Restart NFS-Ganesha — service crashes.
    6. On client: observe "nfs: server <ip> not responding, timed out"
       in kern.log; ls on mount point hangs.
    7. Restore original index, restart ganesha — NFS I/O resumes.

  Test 4: NFS client unaffected after fix
    1. Apply the fix (swap two lines in remove_export() finally block).
    2. Restart manila-share.
    3. Create an NFS share, mount on client, verify I/O.
    4. Drop orphan objects into the RADOS pool (no index entries).
    5. Restart NFS-Ganesha.
    6. Observe: ganesha starts normally, NFS I/O works, no
       "not responding" in dmesg.

  [Regression Potential]

  Low. The change only reorders two independent cleanup operations in
  the finally block of remove_export(). If _remove_rados_object_url_-
  from_index() fails, the object deletion still proceeds as before.
  The only new failure mode is an orphan RADOS object, which is harmless
  (ganesha ignores objects not referenced in the index).

  [Other Info]

  Buggy order in manila/share/drivers/ganesha/manager.py remove_export():
    self._delete_rados_object(...)
    self._remove_rados_object_url_from_index(name)

  Fixed order:
    self._remove_rados_object_url_from_index(name)
    self._delete_rados_object(...)

  Reproduced on OpenStack 2024.1 (Caracal), Ceph Quincy 17.2.9,
  manila-ganesha charm, NFS4 hard mount clients.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2143377/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list