[Bug 2039955] Re: Opening NFS tab in the dashboard leads to ceph mgr crash - orchestrator._interface.NoOrchestrator: No orchestrator configured

Mon Nov 20 05:38:52 UTC 2023

Definitely an upstream issue, not related to the ceph-dashboard charm.

Exploring the ceph repository:

`src/pybind/mgr/dashboard/controllers/nfs.py`

```
    @Endpoint()
    @ReadPermission
    def status(self):
        status = {'available': True, 'message': None}
        try:

            # this is where the call happens that causes the crash - the crash is coming from ceph though, not the fault of this
            # NOTE: running `sudo ceph nfs cluster ls` prints:
            #   Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
            # but does not show a traceback.
            # This may be limited to the python api?
            mgr.remote('nfs', 'cluster_ls')

        except (ImportError, RuntimeError) as error:
            logger.exception(error)
            status['available'] = False
            status['message'] = str(error)  # type: ignore

        return status
```

When the orchestrator is not present, we see this traceback:

```
{
    "archived": "2023-11-20 04:58:57.151697",
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 169, in cluster_ls\n    return available_clusters(self)",
        "  File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n    completion = mgr.describe_service(service_type='nfs')",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n    completion = self._oremote(method_name, args, kwargs)",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n    raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-11-20T04:47:16.737623Z_8a944527-1cc1-4ed5-b58b-86bf97bcf3b1",
    "entity_name": "mgr.juju-108031-1-lxd-1",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "NoOrchestrator",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "b01db59d356dd52f69bfb0b128a216e7606f54a60674c3c82711c23cf64832ce",
    "timestamp": "2023-11-20T04:47:16.737623Z",
    "utsname_hostname": "juju-108031-1-lxd-1",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-88-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
}

```

I guess this is the part that maps directly to the `cluster_ls` method:
```
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
```

This is `cluster_ls`, in `src/pybind/mgr/nfs/module.py`.

```
    # this raises an error, causing a module crash, if orchestrator is not available
    def cluster_ls(self) -> List[str]:
        return available_clusters(self)
```

^ This is the root of the traceback we're seeing.

I guess the reason we're seeing a crash, is because this method doesn't catch any errors thrown from `available_clusters`.
For reference, other methods I've checked here will handle the error.
For example:

(in `src/pybind/mgr/nfs/cluster.py`, called from `ceph nfs cluster ls`
handler in `_cmd_nfs_cluster_ls()` in `src/pybind/mgr/nfs/module.py`)

```
    def list_nfs_cluster(self) -> List[str]:
        try:
            return available_clusters(self.mgr)
        except Exception as e:
            log.exception("Failed to list NFS Cluster")
            raise ErrorResponse.wrap(e)
```

I tried the same pattern of catching the error, and raising `ErrorResponse` within `cluster_ls`,
but that still resulted in a crash:

```
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 173, in cluster_ls\n    return available_clusters(self)",
        "  File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n    completion = mgr.describe_service(service_type='nfs')",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n    completion = self._oremote(method_name, args, kwargs)",
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n    raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)",
        "\nThe above exception was the direct cause of the following exception:\n",
        "Traceback (most recent call last):",
        "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 175, in cluster_ls\n    raise ErrorResponse.wrap(e)",
        "object_format.ErrorResponse: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-11-20T04:59:04.018086Z_2a16b6a4-85e5-49ee-93f0-c1b552f1df06",
    "entity_name": "mgr.juju-108031-1-lxd-1",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "ErrorResponse",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "6a64a2a392fc0ad969c705c51ccec3206fab079f3c53ef566d1ed1d6f5088851",
    "timestamp": "2023-11-20T04:59:04.018086Z",
    "utsname_hostname": "juju-108031-1-lxd-1",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-88-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
}
```

I'm not sure what kind of pattern is required here for this kind of remote module method call where it's not a cli command.
We still need to convey an error response to the remote called (eg. ceph-dashboard in this case),
but without "crashing".

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2039955

Title:
  Opening NFS tab in the dashboard leads to ceph mgr crash -
  orchestrator._interface.NoOrchestrator: No orchestrator configured

Status in Ceph Dashboard Charm:
  New
Status in ceph package in Ubuntu:
  New

Bug description:
  Whenever the NFS tab in the Ceph dashboard is opened, NoOrchestrator
  exception is raised and it's considered as a ceph mgr module crash
  (although it's not an actual process crash).

  Other tabs that require orchestrator handle the situation well, those
  tabs prints the following message but no exception is raised.

  ====
  Orchestrator is not available
  Orchestrator is unavailable: No orchestrator configured (try `ceph orch set backend`)
  Please consult the documentation on how to configure and enable the management functionality. 
  ====

  In the meantime, with the NFS tab, exception is raised.

  https://dashboard.example.com:8443/#/nfs
  ====
  NFS-Ganesha is not configured

  Remote method threw exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/nfs/module.py", line 169, in cluster_ls return available_clusters(self) File "/usr/share/ceph/mgr/nfs/utils.py", line 38, in available_clusters completion = mgr.describe_service(service_type='nfs') File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1488, in inner completion = self._oremote(method_name, args, kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1555, in _oremote raise NoOrchestrator() orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)
  Please consult the documentation on how to configure and enable the management functionality. 
  ====

  # ceph health
  HEALTH_WARN 2 mgr modules have recently crashed

  # ceph crash ls
  ID                                                                ENTITY                   NEW  
  2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c  mgr.juju-ffeb43-0-lxd-0   *   
  2023-10-20T02:24:37.980204Z_9bf106e2-0dd2-4a88-b0f4-647dfa82697f  mgr.juju-ffeb43-0-lxd-0   *   

  # ceph crash info 2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c
  {
      "backtrace": [
          "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 169, in cluster_ls\n    return available_clusters(self)",
          "  File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n    completion = mgr.describe_service(service_type='nfs')",
          "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n    completion = self._oremote(method_name, args, kwargs)",
          "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n    raise NoOrchestrator()",
          "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
      ],
      "ceph_version": "17.2.6",
      "crash_id": "2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c",
      "entity_name": "mgr.juju-ffeb43-0-lxd-0",
      "mgr_module": "nfs",
      "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
      "mgr_python_exception": "NoOrchestrator",
      "os_id": "22.04",
      "os_name": "Ubuntu 22.04.3 LTS",
      "os_version": "22.04.3 LTS (Jammy Jellyfish)",
      "os_version_id": "22.04",
      "process_name": "ceph-mgr",
      "stack_sig": "b01db59d356dd52f69bfb0b128a216e7606f54a60674c3c82711c23cf64832ce",
      "timestamp": "2023-10-20T00:40:40.362363Z",
      "utsname_hostname": "juju-ffeb43-0-lxd-0",
      "utsname_machine": "x86_64",
      "utsname_release": "5.15.0-87-generic",
      "utsname_sysname": "Linux",
      "utsname_version": "#97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023"
  }

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: ceph-mgr-dashboard 17.2.6-0ubuntu0.22.04.1
  ProcVersionSignature: Ubuntu 5.15.0-87.97-generic 5.15.122
  Uname: Linux 5.15.0-87-generic x86_64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: lxd
  CloudName: lxd
  CloudPlatform: lxd
  CloudSubPlatform: LXD socket API v. 1.0 (/dev/lxd/sock)
  Date: Fri Oct 20 09:49:25 2023
  PackageArchitecture: all
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: ceph
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-dashboard/+bug/2039955/+subscriptions