Part 2! Request for help / ideas to debug issue

Michael Hudson-Doyle michael.hudson at canonical.com
Mon Mar 13 21:05:05 UTC 2017


On 14 March 2017 at 01:59, John Lenton <john.lenton at canonical.com> wrote:

> This one is slightly more interesting.
>
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
>
> This one is the real issue :-)
>

Ah _hah_!


> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
>
>     TODO: check if something similar happens when using caps
>
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
>
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
>
> Anyway, code to repro this is at
> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>
> on my machine,
>
> $ for i in `seq -w 9999`; do ./a_c; done | wc -l
> 0
> $ for i in `seq -w 9999`; do ./a_go; done | wc -l
> 394
>
> And,
>
> $ for i in `seq -w 9999`; do taskset 2 ./a_go; done | wc -l
> 0
>
> Gnarly!
>

That's pretty exciting.  I bet this is going to have the same underlying
cause as the other bug: something some other thread in the go process is
doing is causing the kernel to ignore the setuid bit. If I add a
time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
is respected every time. It doesn't help that setuid is ignored when
tracing or that strace likes to hang when you trace a_go.

I spent a while staring at the kernel source but I don't really have any
idea how this might be happening. It might be this code
https://github.com/torvalds/linux/blob/master/security/commoncap.c#L549-L561,
but I don't know how to be sure (well, without building kernels to do
debugging-via-kprint or whatever).

Cheers,
mwh



More information about the Snapcraft mailing list