"fork/exec ... unable to allocate memory"

John Meinel john at arbash-meinel.com
Wed Jun 3 04:47:16 UTC 2015


So we're running into this failure mode again at one of our sites.

Specifically, the system is running with a reasonable number of nodes
(~100) and has been running for a while. It appears that it wanted to
restart itself (I don't think it restarted jujud, but I do think it at
least restarted a lot of the workers.)
Anyway, we have a fair number of things that we "exec" during startup
(kvm-ok, restart rsyslog, etc).
But when we get into this situation (whatever it actually is) then we can't
exec anything and we start getting failures.

Now, this *might* be a golang bug.

When I was trying to debug it in the past, I created a small program that
just allocated big slices of memory (10MB strings, IIRC) and then tried to
run "echo hello" until it started failing.
IIRC the failure point was when I wasn't using swap and the allocated
memory was 50% of total available memory. (I have 8GB of RAM, it would
start failing once we had allocated 4GB of strings).
When I tried digging into the golang code, it looked like they use clone(2)
as the "create a new process for exec" function. And it seemed it wasn't
playing nicely with copy-on-write. At least, it appeared that instead of
doing a simple copy-on-write clone without allocating any new memory and
then exec into a new process, it actually required to have enough RAM
available for the new process.

On the customer site, though, jujud has a RES size of only 1GB, and they
have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
use).

The only workaround I can think of is for us to create a "forker" process
right away at startup that we just send RPC requests to run a command for
us and return the results. ATM I don't think we do any fork and run
interactively such that we need the stdin/stdout file handles inside our
process.

I'd rather just have golang fork() work even when the current process is
using a large amount of RAM.

Any of the golang folks know what is going on?

John
=:->
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20150603/71b679c9/attachment.html>


More information about the Juju-dev mailing list