[Bug 2016252] Re: qemu-system-x86_64 crashes inside systemd autopkgtest (nested VM)

Bug Watch Updater 2016252 at bugs.launchpad.net
Tue Aug 1 20:55:51 UTC 2023


Launchpad has imported 3 comments from the remote bug at
https://sourceware.org/bugzilla/show_bug.cgi?id=30428.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2023-05-08T16:42:13+00:00 Florian Weimer wrote:

This commit:

commit 103a469dc7755fd9e8ccf362f3dd4c55dc761908
Author: Sajan Karumanchi <sajan.karumanchi at amd.com>
Date:   Wed Jan 18 18:29:04 2023 +0100

    x86: Cache computation for AMD architecture.
    
    All AMD architectures cache details will be computed based on
    __cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be
    zeroed out for future architectures.
    
    Reviewed-by: Premachandra Mallappa <premachandra.mallappa at amd.com>

changed cache size computation on the AMD architecture.

However, the new way of doing things is not supported by all AMD CPUs.
This CPU:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 6
model name      : AMD Turion(tm) II Neo N40L Dual-Core Processor
stepping        : 3
microcode       : 0x10000c8
cpu MHz         : 800.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save
bugs            : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs null_seg amd_e400 spectre_v1 spectre_v2
bogomips        : 2995.32
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

reports all zeros for its caches after this change (build from commit
cea74a4a24c36202309e8254f1f938e2166488f3, which includes commit
mentioned above):

$ ./ld.so --list-diagnostics | grep -E 'level|threshold'
x86.cpu_features.non_temporal_threshold=0x4040
x86.cpu_features.rep_movsb_threshold=0x800
x86.cpu_features.rep_movsb_stop_threshold=0x0
x86.cpu_features.rep_stosb_threshold=0x800
x86.cpu_features.level1_icache_size=0x0
x86.cpu_features.level1_icache_linesize=0x0
x86.cpu_features.level1_dcache_size=0x0
x86.cpu_features.level1_dcache_assoc=0x0
x86.cpu_features.level1_dcache_linesize=0x0
x86.cpu_features.level2_cache_size=0x0
x86.cpu_features.level2_cache_assoc=0x0
x86.cpu_features.level2_cache_linesize=0x0
x86.cpu_features.level3_cache_size=0x0
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x0
x86.cpu_features.level4_cache_size=0xffffffffffffffff

A build from the 2.36 branch (commit
b7008a92f505632f32b313d1033d6d15c99a0b31) yields this instead:

$ ./ld.so --list-diagnostics | grep -E 'level|threshold'
x86.cpu_features.non_temporal_threshold=0xc0000
x86.cpu_features.rep_movsb_threshold=0x800
x86.cpu_features.rep_movsb_stop_threshold=0x100000
x86.cpu_features.rep_stosb_threshold=0x800
x86.cpu_features.level1_icache_size=0x10000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x10000
x86.cpu_features.level1_dcache_assoc=0x2
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x100000
x86.cpu_features.level2_cache_assoc=0x10
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x0
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x0
x86.cpu_features.level4_cache_size=0xffffffffffffffff

So it's a regression.

The CPU is probably old enough that we don't use temporal stores, so
there is probably not going to be a crash in glibc. But lack of accurate
cache sizes probably still causes performance regressions elsewhere
(although no one is going to use CPUs that old for their performance,
admittedly).

Some hypervisors also fail to pass through these CPUID values even if
they identify the CPU as an AMD model:
https://bugzilla.redhat.com/show_bug.cgi?id=2196271

Addressing hypervisor compatibility might be the important part here.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2016252/comments/3

------------------------------------------------------------------------
On 2023-05-08T20:29:47+00:00 Florian Weimer wrote:

Just to clarify: this regression affects sysconf (_SC_LEVEL2_CACHE_SIZE)
and similar configuration values, so it impacts more than just glibc-
internal tuning decisions.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2016252/comments/4

------------------------------------------------------------------------
On 2023-07-04T17:34:05+00:00 Florian Weimer wrote:

Initial patch posted (still alters results compared to what we had
before):

[PATCH] x86: Fix for cache computation on AMD legacy cpus.
<https://sourceware.org/pipermail/libc-alpha/2023-June/148763.html>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2016252/comments/6


** Changed in: glibc
       Status: Unknown => Confirmed

** Changed in: glibc
   Importance: Unknown => Critical

** Bug watch added: Red Hat Bugzilla #2196271
   https://bugzilla.redhat.com/show_bug.cgi?id=2196271

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2016252

Title:
  qemu-system-x86_64 crashes inside systemd autopkgtest (nested VM)

Status in GLibC:
  Confirmed
Status in glibc package in Ubuntu:
  New
Status in qemu package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  New

Bug description:
  Systemd package has autopkgtests
  the upstream-2 test cases use upstream systemd testsuite, i.e. make -C str/test/TEST-70-TPM2 setup run
  it launches a nested VM to do quick tests inside it.

  It appears that qemu-system-x86_64 crashes in such cases:

  TEST-70-TPM2 RUN: cryptenroll/cryptsetup with TPM2 devices
  + timeout --foreground 1800 /bin/qemu-system-x86_64 -smp 4 -net none -m 1024M -nographic -vga none -kernel /boot/vmlinuz-6.2.0-1003-lowlatency -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.G2RH6i/tpm2.img -device virtio-rng-pci,max-bytes=1024,period=1000 -chardev socket,id=chrtpm,path=/tmp/tmp.cRBa43SrLC/sock -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0 -initrd /boot/initrd.img-6.2.0-1003-lowlatency -append 'root=LABEL=systemd_boot rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-70.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-70.service oops=panic panic=1 softlockup_panic=1 systemd.wants=end.service'
  qemu-system-x86_64: ../../util/cacheflush.c:208: init_cache_info: Assertion `(isize & (isize - 1)) == 0' failed.
  timeout: the monitored command dumped core
  ..//test-functions: line 377: 152120 Aborted                 ( set -x; "${qemu_cmd[@]}" "${qemu_options[@]}" -append "${kernel_params[*]}" )
  E: qemu failed with exit code 134

  The important bit seems to be:

  qemu-system-x86_64: ../../util/cacheflush.c:208: init_cache_info:
  Assertion `(isize & (isize - 1)) == 0' failed.

  Which is an assert inside qemu source code.

  Is the systemd test suite VM setup doing something wrong, or is there
  something wrong in qemu?

To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/2016252/+subscriptions




More information about the foundations-bugs mailing list