[Bug 1886814] Comment bridged from LTC Bugzilla
bugproxy
1886814 at bugs.launchpad.net
Thu Jul 23 15:09:40 UTC 2020
------- Comment From STLI at de.ibm.com 2020-07-23 11:06 EDT-------
Hi,
I was able to reproduce the "make: echo: Operation not permitted" on my Ubuntu 20.04 s390x machine.
I've installed build and installed the mentioned make-dfsg_4.3-4ubuntu1 package without the "--disable-posix-spawn" configure flag.
I've build flatpak-builder_1.0.11-1 which executes the test which is triggering the "Operation not permitted".
Then I've adjusted the tests, thus I can also run them without building the package itself.
This test runs flatpak-builder which prepares some stuff (e.g. a root-directory with all needed files / binaries / libraries).
flatpak-builder then creates a container with bwrap and calls a configure skript, which generates a Makefile.
In a second invocation, make is invoked.
I've adjusted the configure script which now executed an own small program.
This program is first waiting some time, which I use to deterine its PID. Then I can either attach strace or gdb.
After the timeout, the program just execve's to make. Thus in the end I have a process-chain like:
flatpak-builder--bwrap---bwrap---configure---make
The strace output shows, that the clone syscall is failing with EPERM:
4269 17:08:47.914142 stat("/usr/bin/echo", {st_mode=S_IFREG|0755, st_size=39136, ...}) = 0 <0.000003>
4270 17:08:47.914155 geteuid() = 1001 <0.000001>
4271 17:08:47.914167 getegid() = 1001 <0.000002>
4272 17:08:47.914175 getuid() = 1001 <0.000001>
4273 17:08:47.914182 getgid() = 1001 <0.000001>
4274 17:08:47.914189 access("/usr/bin/echo", X_OK) = 0 <0.000005>
4275 17:08:47.914203 mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x3ff9c86b000 <0.000002>
4276 17:08:47.914214 rt_sigprocmask(SIG_BLOCK, ~[], [HUP INT QUIT TERM CHLD XCPU XFSZ], 8) = 0 <0.000001>
4277 17:08:47.914224 clone(child_stack=0x3ff9c874000, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = -1 EPERM (Operation not permitted) <0.000001>
4278 17:08:47.914235 munmap(0x3ff9c86b000, 36864) = 0 <0.000004>
4279 17:08:47.914245 rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT TERM CHLD XCPU XFSZ], NULL, 8) = 0 <0.000001>
A gdb session showed that posix_spawn is called by make like that (Info: make is using vfork() if configured with "--disable-posix-spawn"):
jobs.c:child_execute_job (struct childbase *child, int good_stdin, char **argv)
posix_spawnattr_t attr;
posix_spawn_file_actions_t fa;
short flags = 0;
posix_spawnattr_init (&attr)
posix_spawn_file_actions_init (&fa)
flags |= POSIX_SPAWN_SETSIGMASK; => 0x08
flags |= POSIX_SPAWN_USEVFORK; => 0x40
fdin=0, fdout=1, fderr=2
flags |= POSIX_SPAWN_RESETIDS; => 0x01
=> flags = 0x49
posix_spawnattr_setflags (&attr, flags)
/* Start the program. */
while ((r = posix_spawn (&pid, cmd, &fa, &attr, argv,
child->environment)) == EINTR)
;
In glibc, the posix_spawn is doing this:
posix_spawn(...) -> __spawni(..., 0) -> __spawnix(..., __execve)
void *stack = __mmap (NULL, stack_size, prot, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
/* Disable asynchronous cancellation. */
__libc_signal_block_all (&args.oldmask);
# define CLONE(__fn, __stack, __stacksize, __flags, __args) \
__clone (__fn, __stack, __flags, __args)
new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
=> __clone (__spawni_child, __stack, CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
<glibc-src>/sysdeps/unix/sysv/linux/s390/s390-64/clone.S
(gdb) i r r2 r3 r4 r5 r6
r2 0x3ffb22f53c0 4396740989888
r3 0x3ffb24f4000 4396743081984
r4 0x4111 16657
r5 0x3ffce97c9e0 4397217597920
r6 0xffffffffffffffff 18446744073709551615
? >0x3ffb2306760 <clone> stg %r6,48(%r15)
? 0x3ffb2306766 <clone+6> lgr %r0,%r5
? 0x3ffb230676a <clone+10> ltgr %r1,%r2
? 0x3ffb230676e <clone+14> je 0x3ffb23067a6 <clone+70>
? 0x3ffb2306772 <clone+18> ltgr %r2,%r3
? 0x3ffb2306776 <clone+22> je 0x3ffb23067a6 <clone+70>
? 0x3ffb230677a <clone+26> lgr %r3,%r4
? 0x3ffb230677e <clone+30> lgr %r4,%r6
? 0x3ffb2306782 <clone+34> lg %r5,168(%r15)
? 0x3ffb2306788 <clone+40> lg %r6,160(%r15)
(gdb) i r r1 r2 r3 r4 r5 r6
r1 0x3ffb22f53c0 4396740989888
r2 0x3ffb24f4000 4396743081984
r3 0x4111 16657
# define CLONE_VM 0x00000100 /* Set if VM shared between processes. */
# define CLONE_VFORK 0x00004000 /* Set if the parent wants the child to wake it up on mm_release. */
<glibc-src>/sysdeps/unix/sysv/linux/bits/signum.h:41:#define SIGCHLD 17 => 0x11
r4 0xffffffffffffffff 18446744073709551615
r5 0x3ffce97c960 4397217597792
r6 0x0 0
/* sys_clone (void *child_stack, unsigned long flags, pid_t *parent_tid, pid_t *child_tid, void *tls); */
? 0x3ffb230678e <clone+46> svc 120 ?
=> sys_clone is returning EPERM instead of succeeding and jumping to __spawni_child().
At this time, the make process has those opened files:
find /proc/273963/fdinfo -type f -printf "\ncat %p\n" -exec cat {} \;
cat /proc/273963/fdinfo/0
pos: 0
flags: 0100000
mnt_id: 25
cat /proc/273963/fdinfo/1
pos: 0
flags: 02001
mnt_id: 14
cat /proc/273963/fdinfo/2
pos: 0
flags: 02001
mnt_id: 14
ls -la /proc/273963/fd/*
lr-x------ 1 stli stli 64 Jul 23 10:31 /proc/273963/fd/0 -> /dev/null
l-wx------ 1 stli stli 64 Jul 23 10:35 /proc/273963/fd/1 -> 'pipe:[661251]'
l-wx------ 1 stli stli 64 Jul 23 10:35 /proc/273963/fd/2 -> 'pipe:[661251]'
A workmate of mine gave me a hint, that he had a similar issue with podman containers where a seccomp filter was applied.
Thus I've used https://github.com/david942j/seccomp-tools with a private patch from my workmate which enables s390x support.
And indeed, there is a seccomp filter applied for the second bwrap-process and its childs:
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 A = arch
0001: 0x15 0x00 0x1f 0x80000016 if (A != ARCH_S390X) goto 0033
0002: 0x20 0x00 0x00 0x00000000 A = sys_number
0003: 0x15 0x1c 0x00 0x00000015 if (A == mount) goto 0032
0004: 0x15 0x1b 0x00 0x00000033 if (A == acct) goto 0032
0005: 0x15 0x1a 0x00 0x00000056 if (A == uselib) goto 0032
0006: 0x15 0x19 0x00 0x00000067 if (A == syslog) goto 0032
0007: 0x15 0x18 0x00 0x00000083 if (A == quotactl) goto 0032
0008: 0x15 0x17 0x00 0x000000d9 if (A == pivot_root) goto 0032
0009: 0x15 0x16 0x00 0x0000010c if (A == mbind) goto 0032
0010: 0x15 0x15 0x00 0x0000010d if (A == get_mempolicy) goto 0032
0011: 0x15 0x14 0x00 0x0000010e if (A == set_mempolicy) goto 0032
0012: 0x15 0x13 0x00 0x00000116 if (A == add_key) goto 0032
0013: 0x15 0x12 0x00 0x00000117 if (A == request_key) goto 0032
0014: 0x15 0x11 0x00 0x00000118 if (A == keyctl) goto 0032
0015: 0x15 0x10 0x00 0x0000011f if (A == migrate_pages) goto 0032
0016: 0x15 0x0f 0x00 0x0000012f if (A == unshare) goto 0032
0017: 0x15 0x0e 0x00 0x00000136 if (A == move_pages) goto 0032
0018: 0x15 0x00 0x05 0x00000036 if (A != ioctl) goto 0024
# => for clone, we goto 0024
0019: 0x20 0x00 0x00 0x00000018 A = cmd # ioctl(fd, cmd, arg)
0020: 0x54 0x00 0x00 0x00000000 A &= 0x0
0021: 0x15 0x00 0x09 0x00000000 if (A != 0) goto 0031
0022: 0x20 0x00 0x00 0x0000001c A = cmd >> 32 # ioctl(fd, cmd, arg)
0023: 0x15 0x08 0x07 0x00005412 if (A == 0x5412) goto 0032 else goto 0031
0024: 0x15 0x00 0x06 0x00000078 if (A != clone) goto 0031
# => all other syscalls are allowed, but clone is handled here
0025: 0x20 0x00 0x00 0x00000010 A = clone_flags # clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls)
0026: 0x54 0x00 0x00 0x00000000 A &= 0x0
0027: 0x15 0x00 0x03 0x00000000 if (A != 0) goto 0031
# => the previous check seems to be a nop
0028: 0x20 0x00 0x00 0x00000014 A = clone_flags >> 32 # clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls)
# => The flags of clone are checked:
0029: 0x54 0x00 0x00 0x10000000 A &= 0x10000000
# define CLONE_NEWUSER 0x10000000 = 268435456 /* New user namespace. */
0030: 0x15 0x01 0x00 0x10000000 if (A == 268435456) goto 0032
# => ERRNO(1) which is EPERM
0031: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0032: 0x06 0x00 0x00 0x00050001 return ERRNO(1)
0033: 0x06 0x00 0x00 0x00000000 return KILL
Unfortunately the order of arguments for clone syscall on s390x differs compared to x86_64!
=> The filter is checking the first argument which on s390x is the stack-pointer instead of the flags.
Note:
The order of arguments and its names are hardcoded in seccomp-tools disassembler.
The seccomp filter is using the argument index.
The ">> 32" also belongs to an hardcoded output of seccomp-tools depending of even or odd index of the argument.
I've saw, that bwrap can apply a seccomp-filer:
bubblewrap.c:do_init():
if (seccomp_prog != NULL &&
prctl (PR_SET_SECCOMP, SECCOMP_MODE_FILTER, seccomp_prog) != 0)
die_with_error ("prctl(PR_SET_SECCOMP)");
This is executed if you call bwrap with "--seccomp FD" (Load and use seccomp rules from FD)
I've also dumped the /proc/PID/cmdline for the processes:
flatpak-builder\0-v\0--repo=/path/to/workdir\0--force-clean\0appdir\0test.json\0
bwrap\0--args\012\0./configure\0--prefix=/app\0--some-arg\0
bwrap\0--args\012\0./configure\0--prefix=/app\0--some-arg\0
/bin/sh\0./configure\0--prefix=/app\0--some-arg\0
Thus I suppose bwrap is not adding this seccomp filter.
I had a look to /proc/<PID of configure>/cgroup
12:cpuset:/
11:perf_event:/
10:devices:/user.slice
9:rdma:/
8:pids:/user.slice/user-1001.slice/user at 1001.service
7:memory:/user.slice/user-1001.slice/user at 1001.service
6:hugetlb:/
5:net_cls,net_prio:/
4:blkio:/user.slice
3:cpu,cpuacct:/user.slice
2:freezer:/
1:name=systemd:/user.slice/user-1001.slice/user at 1001.service/flatpak-org.test.Hello2-14224.scope
0::/user.slice/user-1001.slice/user at 1001.service/flatpak-org.test.Hello2-14224.scope
It could be that systemd is applying the seccomp-filter, but I don't know how.
Can anybody help?
For a test, the seccomp-filter could be adjusted, to check the second argument for the clone syscall.
Of course, for a real patch, the index has to be determined depending on the current architecture.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1886814
Title:
posix_spawn usage in gnu make causes failures on s390x
Status in Ubuntu on IBM z Systems:
Triaged
Status in glibc package in Ubuntu:
New
Status in linux package in Ubuntu:
Incomplete
Status in make-dfsg package in Ubuntu:
New
Bug description:
posix_spawn usage in gnu make causes failures on s390x
Recently in gnu-make v4.3 https://paste.ubuntu.com/p/tYhbJFKN76/ it
started to use posix_spawn, instead of fork()/exec().
This has caused failure of an unrelated package flatpak-builder
autopkgtests on s390x only, like so
echo Building
make: echo: Operation not permitted
make: *** [Makefile:2: all] Error 127
Julian Klaude investigated this in-depth. His earlier research also
indicated that this is a heisenbug, if one tries to print to stderr
before printing to stdout, no issue occurs.
We are configuring GNU make to be build with --disable-posix-spawn on
s390x only. We passed these details to Debian https://bugs.debian.org
/cgi-bin/bugreport.cgi?bug=964541 too.
But I do wonder, if there is something different or incorrect about
posix_spawn() implementation in either glibc, or linux kernel, on
s390x. Or gnu-make's usage of posix_spawn().
As otherise, using posix_spawn() in gnu-make works on other
architectures, and flatpak-builder autopkgtests pass too.
It seems very weird that stdout does not appear to be functional,
unless stderr was opened/written to, from gnu-make execution compiled
with posix-spawn feature.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1886814/+subscriptions
More information about the foundations-bugs
mailing list