"fork/exec ... unable to allocate memory"
John Meinel
john at arbash-meinel.com
Wed Jun 3 04:47:16 UTC 2015
So we're running into this failure mode again at one of our sites.
Specifically, the system is running with a reasonable number of nodes
(~100) and has been running for a while. It appears that it wanted to
restart itself (I don't think it restarted jujud, but I do think it at
least restarted a lot of the workers.)
Anyway, we have a fair number of things that we "exec" during startup
(kvm-ok, restart rsyslog, etc).
But when we get into this situation (whatever it actually is) then we can't
exec anything and we start getting failures.
Now, this *might* be a golang bug.
When I was trying to debug it in the past, I created a small program that
just allocated big slices of memory (10MB strings, IIRC) and then tried to
run "echo hello" until it started failing.
IIRC the failure point was when I wasn't using swap and the allocated
memory was 50% of total available memory. (I have 8GB of RAM, it would
start failing once we had allocated 4GB of strings).
When I tried digging into the golang code, it looked like they use clone(2)
as the "create a new process for exec" function. And it seemed it wasn't
playing nicely with copy-on-write. At least, it appeared that instead of
doing a simple copy-on-write clone without allocating any new memory and
then exec into a new process, it actually required to have enough RAM
available for the new process.
On the customer site, though, jujud has a RES size of only 1GB, and they
have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
use).
The only workaround I can think of is for us to create a "forker" process
right away at startup that we just send RPC requests to run a command for
us and return the results. ATM I don't think we do any fork and run
interactively such that we need the stdin/stdout file handles inside our
process.
I'd rather just have golang fork() work even when the current process is
using a large amount of RAM.
Any of the golang folks know what is going on?
John
=:->
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20150603/71b679c9/attachment.html>
More information about the Juju-dev
mailing list