"fork/exec ... unable to allocate memory"

Wed Jun 3 14:07:50 UTC 2015

Yeah, I'm pretty sure this machine is on "0" and we've just overcommitted
enough that Linux is refusing to overcommit more. I'm pretty sure juju was
at least at 2GB of pages, where 1G was in RAM and 1GB was in swap. And if
we've already overcommitted to 9.7GB over 6.2GB linux probably decided that
another 2GB was "obvious overcommits" that it would refuse.

John
=:->

On Wed, Jun 3, 2015 at 5:32 PM, Gustavo Niemeyer <gustavo at niemeyer.net>
wrote:

> From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:
>
> The Linux kernel supports the following overcommit handling modes
>
> 0	-	Heuristic overcommit handling. Obvious overcommits of
> 		address space are refused. Used for a typical system. It
> 		ensures a seriously wild allocation fails while allowing
> 		overcommit to reduce swap usage.  root is allowed to
> 		allocate slightly more memory in this mode. This is the
> 		default.
>
> 1	-	Always overcommit. Appropriate for some scientific
> 		applications. Classic example is code using sparse arrays
> 		and just relying on the virtual memory consisting almost
> 		entirely of zero pages.
>
> 2	-	Don't overcommit. The total address space commit
> 		for the system is not permitted to exceed swap + a
> 		configurable amount (default is 50%) of physical RAM.
> 		Depending on the amount you use, in most situations
> 		this means a process will not be killed while accessing
> 		pages but will receive errors on memory allocation as
> 		appropriate.
>
> 		Useful for applications that want to guarantee their
> 		memory allocations will be available in the future
> 		without having to initialize every page.
>
>
> On Wed, Jun 3, 2015 at 7:40 AM, John Meinel <john at arbash-meinel.com>
> wrote:
>
>> So interestingly we are already fairly heavily overcommitted. We have 4GB
>> of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
>> CommitLimit:     6214344 kB
>> Committed_AS:    9764580 kB
>>
>> John
>> =:->
>>
>>
>>
>> On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer <gustavo at niemeyer.net>
>> wrote:
>>
>>> Ah, and you can also suggest increasing the swap. It would not actually
>>> be used, but the system would be able to commit to the amount of memory
>>> required, if it really had to.
>>>  On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <gustavo at niemeyer.net>
>>> wrote:
>>>
>>>> Hey John,
>>>>
>>>> It's probably an overcommit issue. Even if you don't have the memory in
>>>> use, cloning it would mean the new process would have a chance to change
>>>> that memory and thus require real memory pages, which the system obviously
>>>> cannot give it. You can workaround that by explicitly enabling overcommit,
>>>> which means the potential to crash late in strange places in the bad case,
>>>> but would be totally okay for the exec situation.
>>>> So we're running into this failure mode again at one of our sites.
>>>>
>>>> Specifically, the system is running with a reasonable number of nodes
>>>> (~100) and has been running for a while. It appears that it wanted to
>>>> restart itself (I don't think it restarted jujud, but I do think it at
>>>> least restarted a lot of the workers.)
>>>> Anyway, we have a fair number of things that we "exec" during startup
>>>> (kvm-ok, restart rsyslog, etc).
>>>> But when we get into this situation (whatever it actually is) then we
>>>> can't exec anything and we start getting failures.
>>>>
>>>> Now, this *might* be a golang bug.
>>>>
>>>> When I was trying to debug it in the past, I created a small program
>>>> that just allocated big slices of memory (10MB strings, IIRC) and then
>>>> tried to run "echo hello" until it started failing.
>>>> IIRC the failure point was when I wasn't using swap and the allocated
>>>> memory was 50% of total available memory. (I have 8GB of RAM, it would
>>>> start failing once we had allocated 4GB of strings).
>>>> When I tried digging into the golang code, it looked like they use
>>>> clone(2) as the "create a new process for exec" function. And it seemed it
>>>> wasn't playing nicely with copy-on-write. At least, it appeared that
>>>> instead of doing a simple copy-on-write clone without allocating any new
>>>> memory and then exec into a new process, it actually required to have
>>>> enough RAM available for the new process.
>>>>
>>>> On the customer site, though, jujud has a RES size of only 1GB, and
>>>> they have 4GB of available RAM and swap is enabled (2GB of 4GB swap
>>>> currently in use).
>>>>
>>>> The only workaround I can think of is for us to create a "forker"
>>>> process right away at startup that we just send RPC requests to run a
>>>> command for us and return the results. ATM I don't think we do any fork and
>>>> run interactively such that we need the stdin/stdout file handles inside
>>>> our process.
>>>>
>>>> I'd rather just have golang fork() work even when the current process
>>>> is using a large amount of RAM.
>>>>
>>>> Any of the golang folks know what is going on?
>>>>
>>>> John
>>>> =:->
>>>>
>>>>
>>>> --
>>>> Juju-dev mailing list
>>>> Juju-dev at lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>
>>>>
>>
>
>
> --
>
> gustavo @ http://niemeyer.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20150603/c871e686/attachment-0001.html>