[Bug 2067889] Re: DPDK 23.11.1 / OVS 3.3.0 cause test failures
Christian Ehrhardt
2067889 at bugs.launchpad.net
Wed Jun 5 08:54:08 UTC 2024
With what I know from comment #14 and comment #5 I tried to get deeper
to finally find what actually goes wrong which still eludes me :-/
Env vars that we need:
OVS_RUNDIR=/var/run/openvswitch/
abs_top_srcdir=../_debian/openvswitch/
1. db creation + starting ovsdb with --detach in background + checking if it has started
=> logs in /var/log/openvswitch/ovsdb-server.log
../../tests/system-dpdk.at:741: touch .conf.db.~lock~
../../tests/system-dpdk.at:741: ovsdb-tool create conf.db $abs_top_srcdir/vswitchd/vswitch.ovsschema
../../tests/system-dpdk.at:741: ovsdb-server --detach --no-chdir --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock
2. initialize and start vswitch in background
=> logs in /var/log/openvswitch/ovs-vswitchd.log
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait init
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra="--log-level=pmd.*:error --no-pci"
../../tests/system-dpdk.at:741: ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl
3. add bridge and port
../../tests/system-dpdk.at:744: ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev
../../tests/system-dpdk.at:745: ovs-vsctl add-port br10 dpdkvhostuserclient0 -- \
set Interface dpdkvhostuserclient0 type=dpdkvhostuserclient options:vhost-server-path=$OVS_RUNDIR/dpdkvhostclient0
4. check for log to be as expected
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00064|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) vhost-user client: socket created, fd: 52
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00065|netdev_dpdk|INFO|vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket '/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0'
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00067|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) reconnecting...
system-dpdk.at:745: wait succeeded immediately
=> so far, all good in both cases
6. start testpmd in other console
dpdk-testpmd --in-memory --socket-mem=512 --single-file-segments --no-pci --vdev="net_virtio_user,path=$OVS_RUNDIR/dpdkvhostclient0,server=1"
Only diff so far, the known:
dpdk|ERR|TELEMETRY: Socket write base info to client failed
5. set valid MTU
../../tests/system-dpdk.at:746: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9702
../../tests/system-dpdk.at:747: ovs-vsctl show
6. set invalid MTU
../../tests/system-dpdk.at:756: ovs-vsctl get Interface dpdkvhostuserclient0 mtu
../../tests/system-dpdk.at:761: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9711
../../tests/system-dpdk.at:762: grep "dpdkvhostuserclient0: unsupported MTU 9711" ovs-vswitchd.log
stdout:
2024-06-05T08:16:36.857Z|00117|netdev_dpdk|WARN|dpdkvhostuserclient0: unsupported MTU 9711
=> Still both systems behave the same
7. delete port and stop testpmd
../../tests/system-dpdk.at:765: ovs-vsctl del-port br10 dpdkvhostuserclient0
../../tests/system-dpdk.at:766: kill `cat testpmd.pid`
The latter in my case is stopping it in the other console.
It seems then the macro/function OVS_DPDK_STOP_VSWITCHD is crucial.
It is called like this in ./tests/system-dpdk.at:
OVS_DPDK_STOP_VSWITCHD(["dnl
/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"])
It seems it lists known messages it expectes in the vswitch log.
The effective clal I see in the log is even a bit longer, maybe some other known messages are added.
../../tests/system-dpdk.at:767: check_logs "/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"
defined as expected with extra DPDK messages in tests/system-dpdk-
macros.at:
m4_define([OVS_DPDK_STOP_VSWITCHD],
[OVS_VSWITCHD_STOP([dnl
$1";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"])
])
And this in turn is in tests/ofproto-macros.at:
# OVS_VSWITCHD_STOP([ALLOWLIST])
#
# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
# for messages with severity WARN or higher and signaling an error if any
# is present. The optional ALLOWLIST may contain shell-quoted "sed"
# commands to delete any warnings that are actually expected, e.g.:
#
# OVS_VSWITCHD_STOP(["/expected error/d"])
m4_define([OVS_VSWITCHD_STOP],
[AT_CHECK([check_logs $1])
OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
OVS_APP_EXIT_AND_WAIT([ovsdb-server])])
So it stopps the two we started in background.
The interesting bit is in "check_logs" which gets all the messages added we've seen.
So as I assumed above it might be as easy as this new error message
about telemetry to break this.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/2067889
Title:
DPDK 23.11.1 / OVS 3.3.0 cause test failures
Status in dpdk package in Ubuntu:
New
Status in openvswitch package in Ubuntu:
New
Bug description:
https://autopkgtest.ubuntu.com/packages/o/openvswitch/oracular/amd64
Bad
Oracular OVS test with DPDK 23.11.1-1
https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/amd64/o/openvswitch/20240531_100906_6f957@/log.gz
Good
Oracular OVS test with DPDK 23.11-1
https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/amd64/o/openvswitch/20240529_111623_6f8db@/log.gz
The logs are messy to read if not used to it, this is a try at debugging what is going wrong.
So far it seems other people have retried 14 times, and 14714 fail at the same tests - I think there is a real issue.
As shown below, one can find always these fail:
4: OVS-DPDK - ping vhost-user ports FAILED (system-dpdk.at:148)
5: OVS-DPDK - ping vhost-user-client ports FAILED (system-dpdk.at:224)
16: OVS-DPDK - MTU increase vport port FAILED (system-dpdk.at:609)
17: OVS-DPDK - MTU decrease vport port FAILED (system-dpdk.at:651)
20: OVS-DPDK - MTU upper bound vport port FAILED (system-dpdk.at:767)
21: OVS-DPDK - MTU lower bound vport port FAILED (system-dpdk.at:809)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2067889/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list