Request for help / ideas to debug issue

Gustavo Niemeyer gustavo.niemeyer at canonical.com
Mon Mar 13 13:53:09 UTC 2017


Thanks for your help on this, Michael.

Yes, the fix seems pretty simple and porting to older Go releases will
ensure people are not getting those super awkward messages randomly.

I have created a Trello card with a checklist so we can track all the
distributions that we need to fix, and included you, John, and Zygmunt.



On Sun, Mar 12, 2017 at 5:38 AM, Michael Hudson-Doyle <
michael.hudson at canonical.com> wrote:

> PS: I guess I should back port that Go fix to all supported Go releases?
>
> On 12 March 2017 at 21:37, Michael Hudson-Doyle <
> michael.hudson at canonical.com> wrote:
>
> > Before we get into this, what is the actual problem here? Just the ugly
> > messages?
> >
> > On 11 March 2017 at 02:58, Alfonso Sanchez-Beato <alfonso.sanchez-beato@
> > canonical.com> wrote:
> >
> >> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton <
> john.lenton at canonical.com>
> >> wrote:
> >>
> >> > Hello!
> >> >
> >> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> >> > you're knowledgeable about one or more of those things, could you take
> >> > a look? Thank you.
> >> >
> >> > The issue manifests as nasty warnings from the "snap run" command,
> >> > which is also the first step into a snapped app or service. It looks
> >> > like
> >> >
> >> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >> >
> >> > a very stripped-down reproducer is http://pastebin.ubuntu.com/
> 24150663/
> >> >
> >> > build that, run it in a loop, and you'll see a bunch of those messages
> >> > (and some weirder ones, but let's take it one step at a time)
> >>
> >
> > Turns out this was fixed in Go 1.8: https://go-review.
> > googlesource.com/c/33894/
> >
> >
> >> > if you comment out the 'import "C"' line the message will change but
> >> > still happen, which makes me think that at least in part this is a Go
> >> > issue (or that we're holding it wrong).
> >>
> >
> > ... but only in the non-cgo case, you can (occasionally) still get
> > messages like:
> >
> > runtime: failed to create new OS thread (have 5 already; errno=11)
> > runtime: may need to increase max user processes (ulimit -u)
> > fatal error: newosproc
> >
> > if you comment out the import "C". I guess we should report that
> upstream.
> >
> >
> >> > Note that the exec does work; the warning seems to come from a
> >> > different thread than the one doing the Exec (the other clue that
> >> > points in this direction is that sometimes the message is truncated).
> >> > You can verify the fact that it does run by changing the /bin/true to
> >> > /bin/echo os.Args[1], but because this issue is obviously a race
> >> > somewhere, this change makes it less likely to happen (from ~10% down
> >> > to ~.5% of runs, in my machines).
> >> >
> >> > One thing that makes this harder to debug is that strace'ing the
> >> > process hangs (hard, kill -9 of strace to get out) before reproducing
> >> > the issue. This probably means we need to trace it at a lower level,
> >> > and I don't know enough about tracing a process group from inside the
> >> > kernel to be able to do that; what I can find about kernel-level
> >> > tracing is around syscalls or devices.
> >> >
> >> > Ideas?
> >> >
> >>
> >> I found this related thread:
> >>
> >> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
> >> 4/lhROTfN9TxIJ
> >>
> >> <<
> >> I believe this can happen on GNU/Linux if your program uses cgo and if
> >> thread A is in the Go runtime starting up a new thread B while thread
> >> C is execing a program.  The underlying cause is that while one thread
> >> is calling exec the Linux kernel will fail attempts by other threads
> >> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> >> field in the kernel sources.)
> >> >>
> >>
> >
> > Yeah, this seems to be very accurate. It's also why it seems this is a
> > cosmetic problem only, some thread not calling exec fails, but well, the
> > thread is about to die anyway.
> >
> >
> >> Something like adding a little sleep removes the traces, for instance:
> >>
> >> http://paste.ubuntu.com/24151637/
> >>
> >> where the program run sleep for 1ms before calling Exec. For smaller
> units
> >> (say, 20 us) the issue still happens.
> >>
> >> It looks to me that right before running main(), go creates some
> threads,
> >> calling clone() and probably getting the race described in the thread.
> As
> >> anyway you are running Exec I guess the traces are harmless, you do not
> >> need the go threads. Nonetheless, I think that the go run time should
> >> retry
> >> instead of printing that trace.
> >
> >
> > Cheers,
> > mwh
> >
> --
> Snapcraft mailing list
> Snapcraft at lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net



More information about the Snapcraft mailing list