[Bug 2089789] Re: malloc performance degradation with CPU affinity masks
Mauricio Faria de Oliveira
2089789 at bugs.launchpad.net
Wed Nov 27 21:53:31 UTC 2024
** Description changed:
- https://sourceware.org/bugzilla/show_bug.cgi?id=30945
+ [Impact]
+
+ * Jammy has a malloc() performance degradation
+ if CPU affinity masks are used (not default).
+
+ * The maximum number of arenas for malloc() is
+ calculated based on the number of processors.
+
+ However, glibc 2.34 changed that to be based
+ on sched_getaffinity(), which is the number
+ of processors available _to the process_
+ (i.e., based on CPU affinity masks). [0]
+
+ Previously, glibc 2.33 instead used the
+ of processors available _in the system_
+ (i.e., based on sysfs and procfs files).
+
+ * This is not an issue by default, as without
+ CPU affinity masks, the returned number of
+ processors is the same as sysfs and procfs.
+
+ But it _is_ an issue if CPU affinity masks
+ are set, as it can increase lock contention
+ (less arenas), and thus degrade performance.
+
+ * CPU affinity can be set at the process-level
+ (e.g., taskset, numactl, sched_setaffinity())
+ or at the system-level (kernel boot options).
+
+ The latter is common in hypervisor and/or DPDK
+ deployments, where CPU partitioning is applied
+ with isolcpus, cpusets, systemd's CPUAffinity.
+
+ [Test Plan]
+
+ * The upstream bug report [1] has a reproducer,
+ used in comment #5 to reproduce the problem,
+ and in comment #6 to validate the fix patch.
+
+ It is copied/attached to this bug as backup
+ (test-glibc-malloc.c).
+
+ The expected behavior is that these 2 steps
+ (measuring the average time taken by 50.000
+ malloc+free calls, with one thread per CPU)
+ take similar amounts of time with & without
+ CPU affinity masks (parameter 2: true/false),
+ in a system with a great number of CPUs.
+
+ $ ./test-glibc-malloc $(nproc) false false
+ $ ./test-glibc-malloc $(nproc) true false
+
+ * glibc has a build-time test suite.
+
+ * glibc has autopkgtests (rebuild, ie, above)
+ and triggers autopkgtests in a great number
+ of reverse test dependencies.
+
+ [Regression Potential]
+
+ * Theoretically, any fallout should be contained
+ in malloc() and be related only to performance,
+ not to functional errors.
+
+ * This happens because this malloc() patch [2] changes
+ only which method to get the number of processors.
+
+ * The method it changes to is what has been already
+ used by previous versions of glibc (up to 2.33),
+ which has been adopted back (2.39) and backported
+ to all glibc releases after that version (2.34-2.38),
+ which includes the version in Jammy (2.35 [3]).
+
+ * The method it changes to is also exercised in other
+ code paths (not just malloc()), thus it is already
+ used and tested in Jammy -- it is not something new.
+
+ [Other Info]
+
+ * For details and analysis of (no) required
+ dependencies, see comments #1, #2, and #3.
+
+ * Upstream bug report [1]
+
+ [0]
+
+ glibc 2.33:
+ $ git log --oneline origin/release/2.33/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
+ $
+
+ glibc 2.34:
+ $ git log --oneline origin/release/2.34/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
+ e870aac8974c misc: Add __get_nprocs_sched
+
+ glibc 2.35:
+ $ git log --oneline origin/release/2.35/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
+ 11a02b035b46 misc: Add __get_nprocs_sched
+
+ [1] https://sourceware.org/bugzilla/show_bug.cgi?id=30945
+
+ [2]
+ https://sourceware.org/git/?p=glibc.git;a=commit;h=472894d2cfee5751b44c0aaa71ed87df81c8e62e
+
+ [3]
+ https://sourceware.org/git/?p=glibc.git;a=commit;h=d47c5e4db7924bb10efe14b787c4bd868b604e48
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/2089789
Title:
malloc performance degradation with CPU affinity masks
Status in glibc package in Ubuntu:
Fix Released
Status in glibc source package in Jammy:
Confirmed
Bug description:
[Impact]
* Jammy has a malloc() performance degradation
if CPU affinity masks are used (not default).
* The maximum number of arenas for malloc() is
calculated based on the number of processors.
However, glibc 2.34 changed that to be based
on sched_getaffinity(), which is the number
of processors available _to the process_
(i.e., based on CPU affinity masks). [0]
Previously, glibc 2.33 instead used the
of processors available _in the system_
(i.e., based on sysfs and procfs files).
* This is not an issue by default, as without
CPU affinity masks, the returned number of
processors is the same as sysfs and procfs.
But it _is_ an issue if CPU affinity masks
are set, as it can increase lock contention
(less arenas), and thus degrade performance.
* CPU affinity can be set at the process-level
(e.g., taskset, numactl, sched_setaffinity())
or at the system-level (kernel boot options).
The latter is common in hypervisor and/or DPDK
deployments, where CPU partitioning is applied
with isolcpus, cpusets, systemd's CPUAffinity.
[Test Plan]
* The upstream bug report [1] has a reproducer,
used in comment #5 to reproduce the problem,
and in comment #6 to validate the fix patch.
It is copied/attached to this bug as backup
(test-glibc-malloc.c).
The expected behavior is that these 2 steps
(measuring the average time taken by 50.000
malloc+free calls, with one thread per CPU)
take similar amounts of time with & without
CPU affinity masks (parameter 2: true/false),
in a system with a great number of CPUs.
$ ./test-glibc-malloc $(nproc) false false
$ ./test-glibc-malloc $(nproc) true false
* glibc has a build-time test suite.
* glibc has autopkgtests (rebuild, ie, above)
and triggers autopkgtests in a great number
of reverse test dependencies.
[Regression Potential]
* Theoretically, any fallout should be contained
in malloc() and be related only to performance,
not to functional errors.
* This happens because this malloc() patch [2] changes
only which method to get the number of processors.
* The method it changes to is what has been already
used by previous versions of glibc (up to 2.33),
which has been adopted back (2.39) and backported
to all glibc releases after that version (2.34-2.38),
which includes the version in Jammy (2.35 [3]).
* The method it changes to is also exercised in other
code paths (not just malloc()), thus it is already
used and tested in Jammy -- it is not something new.
[Other Info]
* For details and analysis of (no) required
dependencies, see comments #1, #2, and #3.
* Upstream bug report [1]
[0]
glibc 2.33:
$ git log --oneline origin/release/2.33/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
$
glibc 2.34:
$ git log --oneline origin/release/2.34/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
e870aac8974c misc: Add __get_nprocs_sched
glibc 2.35:
$ git log --oneline origin/release/2.35/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
11a02b035b46 misc: Add __get_nprocs_sched
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=30945
[2]
https://sourceware.org/git/?p=glibc.git;a=commit;h=472894d2cfee5751b44c0aaa71ed87df81c8e62e
[3]
https://sourceware.org/git/?p=glibc.git;a=commit;h=d47c5e4db7924bb10efe14b787c4bd868b604e48
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2089789/+subscriptions
More information about the foundations-bugs
mailing list