[Bug 2152460] [NEW] Ceph 20.2.0 - cephadm cluster bootstrap fails due to hardcoded uid and gid

Wed May 13 09:56:05 UTC 2026

Public bug reported:

After installing cephadm on Resolute:

ii cephadm 20.2.0-0ubuntu2 all

Trying to bootstrap a cluster:

$ sudo cephadm bootstrap --mon-ip 10.3.1.190
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chrony.service is enabled and running
Repeating the final host check...
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Host looks OK
Cluster fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
Verifying IP 10.3.1.190 port 3300 ...
Verifying IP 10.3.1.190 port 6789 ...
Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image quay.io/ceph/ceph:v20...
Ceph version: ceph version 20.2.1 (6a49aff47758778a5f5951e731d437c317f72fb2) tentacle (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Non-zero exit code 1 from install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652
install: stderr install: invalid user: '167'
RuntimeError: Failed command: install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid user: '167'

        ***************
        Cephadm hit an issue during cluster installation. Current cluster files will be deleted automatically.
        To disable this behaviour you can pass the --no-cleanup-on-failure flag. In case of any previous
        broken installation, users must use the following command to completely delete the broken cluster:

        > cephadm rm-cluster --force --zap-osds --fsid <fsid>

        for more information please refer to https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster
        ***************

Deleting cluster with fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 5288, in <module>
    main()
    ~~~~^^
  File "/usr/sbin/cephadm", line 5276, in main
    r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 2524, in _rollback
    return func(ctx)
  File "/usr/sbin/cephadm", line 434, in _default_image
    return func(ctx)
  File "/usr/sbin/cephadm", line 2684, in command_bootstrap
    make_var_run(ctx, fsid, uid, gid)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/usr/sbin/cephadm", line 508, in make_var_run
    call_throws(ctx, ['install', '-d', '-m0770', '-o', str(uid), '-g', str(gid),
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      '/var/run/ceph/%s' % fsid])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cephadmlib/call_wrappers.py", line 307, in call_throws
    raise RuntimeError(
        f'Failed command: {" ".join(command)}: {s}'
    )

RuntimeError: Failed command: install -d -m0770 -o 167 -g 167
/var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid
user: '167'

On RHEL, uid/gid is reserved for the ceph user, however we do not have
that on Ubuntu and this needs to get fixed on the next SRU.

Thanks,
Alan

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  After installing cephadm on Resolute:

  ii cephadm 20.2.0-0ubuntu2 all

  Trying to bootstrap a cluster:

  $ sudo cephadm bootstrap --mon-ip 10.3.1.190
- 
- $ sudo cephadm bootstrap --mon-ip 10.3.1.190                                                                                                                                                                           
- Creating directory /etc/ceph for ceph.conf                                                                                                                                                                                                    
- Verifying podman|docker is present...                                                                                                                                                                                                         
- Verifying lvm2 is present...                                                                                                                                                                                                                  
- Verifying time synchronization is in place...                                                                                                                                                                                                 
- Unit chrony.service is enabled and running                                                                                                                                                                                                    
- Repeating the final host check...                                                                                                                                                                                                             
- docker (/usr/bin/docker) is present                                                                                                                                                                                                           
- systemctl is present                                                                                                                                                                                                                          
- lvcreate is present                                                                                                                                                                                                                           
- Unit chrony.service is enabled and running                                                                                                                                                                                                    
- Host looks OK                                                                                                                                                                                                                                 
- Cluster fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652                                                                                                                                                                                            
- Verifying IP 10.3.1.190 port 3300 ...                                                                                                                                                                                                         
- Verifying IP 10.3.1.190 port 6789 ...                                                                                                                                                                                                         
- Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`                                                                                                                                                                                          
- Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`                                                                                                                                                                                          
- Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`                                                                                                                                                                                          
- Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`                                                                                                                                                                                          
- Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`                                                                                                                                                                                        
+ Creating directory /etc/ceph for ceph.conf
+ Verifying podman|docker is present...
+ Verifying lvm2 is present...
+ Verifying time synchronization is in place...
+ Unit chrony.service is enabled and running
+ Repeating the final host check...
+ docker (/usr/bin/docker) is present
+ systemctl is present
+ lvcreate is present
+ Unit chrony.service is enabled and running
+ Host looks OK
+ Cluster fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
+ Verifying IP 10.3.1.190 port 3300 ...
+ Verifying IP 10.3.1.190 port 6789 ...
+ Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
+ Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
+ Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
+ Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
+ Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
  Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
  Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
  Pulling container image quay.io/ceph/ceph:v20...
  Ceph version: ceph version 20.2.1 (6a49aff47758778a5f5951e731d437c317f72fb2) tentacle (stable)
  Extracting ceph user uid/gid from container image...
  Creating initial keys...
  Creating initial monmap...
  Creating mon...
  Non-zero exit code 1 from install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652
  install: stderr install: invalid user: '167'
  RuntimeError: Failed command: install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid user: '167'

+         ***************
+         Cephadm hit an issue during cluster installation. Current cluster files will be deleted automatically.
+         To disable this behaviour you can pass the --no-cleanup-on-failure flag. In case of any previous
+         broken installation, users must use the following command to completely delete the broken cluster:

-         ***************
-         Cephadm hit an issue during cluster installation. Current cluster files will be deleted automatically.
-         To disable this behaviour you can pass the --no-cleanup-on-failure flag. In case of any previous
-         broken installation, users must use the following command to completely delete the broken cluster:
+         > cephadm rm-cluster --force --zap-osds --fsid <fsid>

-         > cephadm rm-cluster --force --zap-osds --fsid <fsid>
- 
-         for more information please refer to https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster
-         ***************
+         for more information please refer to https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster
+         ***************

  Deleting cluster with fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
  Traceback (most recent call last):
-   File "/usr/sbin/cephadm", line 5288, in <module>
-     main()
-     ~~~~^^
-   File "/usr/sbin/cephadm", line 5276, in main
-     r = ctx.func(ctx)
-   File "/usr/sbin/cephadm", line 2524, in _rollback
-     return func(ctx)
-   File "/usr/sbin/cephadm", line 434, in _default_image
-     return func(ctx)
-   File "/usr/sbin/cephadm", line 2684, in command_bootstrap
-     make_var_run(ctx, fsid, uid, gid)
-     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
-   File "/usr/sbin/cephadm", line 508, in make_var_run
-     call_throws(ctx, ['install', '-d', '-m0770', '-o', str(uid), '-g', str(gid),
-     ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-                       '/var/run/ceph/%s' % fsid])
-                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
-   File "/usr/lib/python3/dist-packages/cephadmlib/call_wrappers.py", line 307, in call_throws
-     raise RuntimeError(
-         f'Failed command: {" ".join(command)}: {s}'
-     )
+   File "/usr/sbin/cephadm", line 5288, in <module>
+     main()
+     ~~~~^^
+   File "/usr/sbin/cephadm", line 5276, in main
+     r = ctx.func(ctx)
+   File "/usr/sbin/cephadm", line 2524, in _rollback
+     return func(ctx)
+   File "/usr/sbin/cephadm", line 434, in _default_image
+     return func(ctx)
+   File "/usr/sbin/cephadm", line 2684, in command_bootstrap
+     make_var_run(ctx, fsid, uid, gid)
+     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
+   File "/usr/sbin/cephadm", line 508, in make_var_run
+     call_throws(ctx, ['install', '-d', '-m0770', '-o', str(uid), '-g', str(gid),
+     ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                       '/var/run/ceph/%s' % fsid])
+                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+   File "/usr/lib/python3/dist-packages/cephadmlib/call_wrappers.py", line 307, in call_throws
+     raise RuntimeError(
+         f'Failed command: {" ".join(command)}: {s}'
+     )

  RuntimeError: Failed command: install -d -m0770 -o 167 -g 167
  /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid
  user: '167'

- 
- On RHEL, uid/gid is reserved for the ceph user, however we do not have that on Ubuntu and this needs to get fixed on the next SRU.
+ On RHEL, uid/gid is reserved for the ceph user, however we do not have
+ that on Ubuntu and this needs to get fixed on the next SRU.

  Thanks,
  Alan

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2152460

Title:
  Ceph 20.2.0 - cephadm cluster bootstrap fails due to hardcoded uid and
  gid

Status in ceph package in Ubuntu:
  New

Bug description:
  After installing cephadm on Resolute:

  ii cephadm 20.2.0-0ubuntu2 all

  Trying to bootstrap a cluster:

  $ sudo cephadm bootstrap --mon-ip 10.3.1.190
  Creating directory /etc/ceph for ceph.conf
  Verifying podman|docker is present...
  Verifying lvm2 is present...
  Verifying time synchronization is in place...
  Unit chrony.service is enabled and running
  Repeating the final host check...
  docker (/usr/bin/docker) is present
  systemctl is present
  lvcreate is present
  Unit chrony.service is enabled and running
  Host looks OK
  Cluster fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
  Verifying IP 10.3.1.190 port 3300 ...
  Verifying IP 10.3.1.190 port 6789 ...
  Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
  Mon IP `10.3.1.190` is in CIDR network `10.3.0.0/21`
  Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
  Mon IP `10.3.1.190` is in CIDR network `10.3.0.1/32`
  Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
  Mon IP `10.3.1.190` is in CIDR network `10.3.1.100/32`
  Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
  Pulling container image quay.io/ceph/ceph:v20...
  Ceph version: ceph version 20.2.1 (6a49aff47758778a5f5951e731d437c317f72fb2) tentacle (stable)
  Extracting ceph user uid/gid from container image...
  Creating initial keys...
  Creating initial monmap...
  Creating mon...
  Non-zero exit code 1 from install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652
  install: stderr install: invalid user: '167'
  RuntimeError: Failed command: install -d -m0770 -o 167 -g 167 /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid user: '167'

          ***************
          Cephadm hit an issue during cluster installation. Current cluster files will be deleted automatically.
          To disable this behaviour you can pass the --no-cleanup-on-failure flag. In case of any previous
          broken installation, users must use the following command to completely delete the broken cluster:

          > cephadm rm-cluster --force --zap-osds --fsid <fsid>

          for more information please refer to https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster
          ***************

  Deleting cluster with fsid: e4638df2-4eaf-11f1-8000-fa163ebbf652
  Traceback (most recent call last):
    File "/usr/sbin/cephadm", line 5288, in <module>
      main()
      ~~~~^^
    File "/usr/sbin/cephadm", line 5276, in main
      r = ctx.func(ctx)
    File "/usr/sbin/cephadm", line 2524, in _rollback
      return func(ctx)
    File "/usr/sbin/cephadm", line 434, in _default_image
      return func(ctx)
    File "/usr/sbin/cephadm", line 2684, in command_bootstrap
      make_var_run(ctx, fsid, uid, gid)
      ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    File "/usr/sbin/cephadm", line 508, in make_var_run
      call_throws(ctx, ['install', '-d', '-m0770', '-o', str(uid), '-g', str(gid),
      ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        '/var/run/ceph/%s' % fsid])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/cephadmlib/call_wrappers.py", line 307, in call_throws
      raise RuntimeError(
          f'Failed command: {" ".join(command)}: {s}'
      )

  RuntimeError: Failed command: install -d -m0770 -o 167 -g 167
  /var/run/ceph/e4638df2-4eaf-11f1-8000-fa163ebbf652: install: invalid
  user: '167'

  On RHEL, uid/gid is reserved for the ceph user, however we do not have
  that on Ubuntu and this needs to get fixed on the next SRU.

  Thanks,
  Alan

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2152460/+subscriptions