Memory leaks in Ubuntu 20 kernel?
nate
ubuntu at linuxpowered.net
Fri May 21 05:00:06 UTC 2021
Hello -
(Linux user since 1996)
TL;DR (?) - Memory usage more than doubled for simple workload on Ubuntu
20.04 vs
16.04 (and 12.04 and 10.04 before), source of memory usage not reported
as in use
by any process in "top". Memory continues to leak as time goes on
Sorry for long post but I wanted to include as much detail as I could.
Was wondering if anyone else had noticed this. Have been replacing many
16.04 systems with
20.04 and in some cases the 20.04 systems are using a ton more memory
for no apparent reason.
Most basic type of system I have is a utility server which runs services
such as:
(% numbers are memory usage reported by top after restarting the
services)
bind - 0.7%
splunk forwarder - 1.2%
syslog-ng (local logs only) - 0.2%
snmpd - 0.2%
NFS client - ~0.2%
autofs - 0.3%
Apache (basic config, 3 workers minimal traffic) - 0.1%
postfix relay (minimal traffic) - 0.1%
Chef configuration management agent - 0.4%
Only 26 kernel modules loaded(tried to minimize modules that are loaded,
my home Ubuntu
20 laptop by contrast has 140 modules loaded)
sample 'free -m' immediately after flushing swap and clearing disk
buffers
(as in 2 seconds after):
total used free shared buff/cache
available
Mem: 2983 2047 101 1 834
718
Swap: 511 56 455
System has 718MB of "available" memory but it feels an immediate need to
get into
swap right after swap is cleared.
All local filesystems are ext4 (so no memory hogging zfs or something
running), only
4.1GB of local disk space in use.
I have a custom system memory monitoring script I wrote back in 2004 and
have been evolving it
to support newer linux interfaces since then. I have data going back for
the past year on these
systems. On ubuntu 16.04 the systems were configured with 1.5GB of
memory, 1 CPU, and were using
on average 300MB of memory, about 850MB of cache, ~130MB of buffers, and
a pretty steady ~180MB
of free memory. This usage was extremely stable for at least 8 months
prior to upgrade.
Using the exact same configuration (this is managed by Chef), on Ubuntu
20.04 the memory usage
has grown dramatically. System was installed Feb 9 2021. I increased the
system memory to 2GB
as part of the upgrade, then increased again to 3GB at the end of April.
Currently a sample
system is using 1.08GB memory(up more than 300%), 1.64GB of cache(about
200% increase), almost
no buffers, and about 130MB free. The memory usage seems to be at a big
leak, on Feb 9 memory
usage was about 400MB and now it is 1.08GB.
Systems had been running on Ubuntu 16.04 for probably the past 4 years,
and before that,
Ubuntu 12.04 and before that 10.04. All with the same configuration (+/-
minor required option
changes to support each distro version etc).
Running "top" reports nothing using more than 1.8% of memory. I
restarted all of the "major"
services(not expecting any results), and sure enough zero impact to
memory usage. I flushed
all of the kernel cache buffers:
echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches
It freed up only about 100MB.
All systems have a 512MB "last resort" kind of swap. "vm.swappiness" has
been set to 0 for
the past decade. System swaps often. I have a cron set to analyze swap
usage and free
memory and clear swap every hour if there is space, when it clears swap
it also flushes
all of the buffers.
These are low usage systems, with a single CPU, they average under 5%
CPU usage 24/7.
I before upgrading to 3GB, I tried turning off swap just to see what
would happen, the
result was the OOM killer started getting called. So I re-enabled swap
and increased memory
to 3GB.
Realistically these systems SHOULD run fine with 1GB of memory, I only
set it to 2GB for
situations during apt-get upgrades I have seen in the past memory usage
spike. But now it
is at 3GB and the memory usage profile has not changed it is continuing
to leak more.
OS is Ubuntu 20.04.2
Kernel is 5.4.0-65-generic (I know not the latest)
I did come across this post a few weeks ago which had a comment calling
into question
the memory changes in 5.4, and that some newer kernels were better for
that person
but not "normal" memory usage:
https://askubuntu.com/questions/1278460/why-does-vm-swappiness-not-working
These do all run inside Vmware ESX, hosts have plenty of memory, there
is no
ballooning going on (never have had a ballooning incident in the past
decade, have
tons of extra memory). I gather probably 30,000 data points a minute
across our
infrastructure all graphed, and alerted on in LogicMonitor.
I can't imagine others haven't encountered this situation but am having
a hard time
finding references to it anywhere but that askubuntu post above.
I am just not sure where to look to find the source of the usage. I came
across this
tonight:
https://www.kernel.org/doc/html/v5.4/dev-tools/kmemleak.html
Though the option it requires isn't enabled in the kernel, I could build
my own kernel
to test but am uncertain if I could make sense of the results as I am
not a kernel
dev.
I have seen significantly higher memory usage in MySQL servers in some
situations vs
16.04 as well, and other places too, but in those cases the memory usage
actually
registers as being used by MySQL. In this case it is not registering as
being used
by any process.
I have 14 different servers running this configuration and they are all
behaving
the same. Currently have nearly 300 Ubuntu 20.04 systems. These utility
servers
should be the easiest ones to narrow down the issue as they don't run
any fancy
software. Still have a couple hundred 16.04 systems left to upgrade as
well.
Worst case I guess I can just reboot them occasionally, not something
I've ever had
to do for Linux.
thanks
nate
More information about the ubuntu-users
mailing list