[Bug 2067889] Re: DPDK 23.11.1 / OVS 3.3.0 cause test failures

Christian Ehrhardt  2067889 at bugs.launchpad.net
Wed Jun 5 08:54:08 UTC 2024


With what I know from comment #14 and comment #5 I tried to get deeper
to finally find what actually goes wrong which still eludes me :-/


Env vars that we need:
OVS_RUNDIR=/var/run/openvswitch/
abs_top_srcdir=../_debian/openvswitch/


1. db creation + starting ovsdb with --detach in background + checking if it has started
=> logs in /var/log/openvswitch/ovsdb-server.log

  ../../tests/system-dpdk.at:741: touch .conf.db.~lock~
  ../../tests/system-dpdk.at:741: ovsdb-tool create conf.db $abs_top_srcdir/vswitchd/vswitch.ovsschema
  ../../tests/system-dpdk.at:741: ovsdb-server --detach --no-chdir --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock

2. initialize and start vswitch in background
=> logs in /var/log/openvswitch/ovs-vswitchd.log

../../tests/system-dpdk.at:741: ovs-vsctl --no-wait init 
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra="--log-level=pmd.*:error --no-pci"
../../tests/system-dpdk.at:741: ovs-vswitchd  --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl

3. add bridge and port
../../tests/system-dpdk.at:744: ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev
../../tests/system-dpdk.at:745: ovs-vsctl add-port br10 dpdkvhostuserclient0 -- \
              set Interface dpdkvhostuserclient0 type=dpdkvhostuserclient options:vhost-server-path=$OVS_RUNDIR/dpdkvhostclient0

4. check for log to be as expected
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00064|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) vhost-user client: socket created, fd: 52
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00065|netdev_dpdk|INFO|vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket '/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0'
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00067|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) reconnecting...
system-dpdk.at:745: wait succeeded immediately

=> so far, all good in both cases

6. start testpmd in other console
dpdk-testpmd --in-memory --socket-mem=512 --single-file-segments --no-pci --vdev="net_virtio_user,path=$OVS_RUNDIR/dpdkvhostclient0,server=1"

Only diff so far, the known:
  dpdk|ERR|TELEMETRY: Socket write base info to client failed


5. set valid MTU

../../tests/system-dpdk.at:746: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9702
../../tests/system-dpdk.at:747: ovs-vsctl show

6. set invalid MTU

../../tests/system-dpdk.at:756: ovs-vsctl get Interface dpdkvhostuserclient0 mtu
../../tests/system-dpdk.at:761: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9711
../../tests/system-dpdk.at:762: grep "dpdkvhostuserclient0: unsupported MTU 9711" ovs-vswitchd.log
stdout:
2024-06-05T08:16:36.857Z|00117|netdev_dpdk|WARN|dpdkvhostuserclient0: unsupported MTU 9711

=> Still both systems behave the same

7. delete port and stop testpmd
../../tests/system-dpdk.at:765: ovs-vsctl del-port br10 dpdkvhostuserclient0
../../tests/system-dpdk.at:766: kill `cat testpmd.pid`
The latter in my case is stopping it in the other console.

It seems then the macro/function OVS_DPDK_STOP_VSWITCHD is crucial.
It is called like this in ./tests/system-dpdk.at:

OVS_DPDK_STOP_VSWITCHD(["dnl
/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"])

It seems it lists known messages it expectes in the vswitch log.


The effective clal I see in the log is even a bit longer, maybe some other known messages are added.

../../tests/system-dpdk.at:767: check_logs "/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"

defined as expected with extra DPDK messages in tests/system-dpdk-
macros.at:

m4_define([OVS_DPDK_STOP_VSWITCHD],
  [OVS_VSWITCHD_STOP([dnl
$1";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"])
]) 

And this in turn is in tests/ofproto-macros.at:

# OVS_VSWITCHD_STOP([ALLOWLIST])
#
# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
# for messages with severity WARN or higher and signaling an error if any
# is present.  The optional ALLOWLIST may contain shell-quoted "sed"
# commands to delete any warnings that are actually expected, e.g.:
#
#   OVS_VSWITCHD_STOP(["/expected error/d"])
m4_define([OVS_VSWITCHD_STOP],
  [AT_CHECK([check_logs $1])
   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
   OVS_APP_EXIT_AND_WAIT([ovsdb-server])])

So it stopps the two we started in background.
The interesting bit is in "check_logs" which gets all the messages added we've seen.

So as I assumed above it might be as easy as this new error message
about telemetry to break this.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/2067889

Title:
  DPDK 23.11.1 / OVS 3.3.0 cause test failures

Status in dpdk package in Ubuntu:
  New
Status in openvswitch package in Ubuntu:
  New

Bug description:
  https://autopkgtest.ubuntu.com/packages/o/openvswitch/oracular/amd64

  Bad
  Oracular OVS test with DPDK 23.11.1-1
  https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/amd64/o/openvswitch/20240531_100906_6f957@/log.gz

  Good
  Oracular OVS test with DPDK 23.11-1
  https://autopkgtest.ubuntu.com/results/autopkgtest-oracular/oracular/amd64/o/openvswitch/20240529_111623_6f8db@/log.gz

  The logs are messy to read if not used to it, this is a try at debugging what is going wrong.
  So far it seems other people have retried 14 times, and 14714 fail at the same tests - I think there is a real issue.

  As shown below, one can find always these fail:
  4: OVS-DPDK - ping vhost-user ports                FAILED (system-dpdk.at:148)
  5: OVS-DPDK - ping vhost-user-client ports         FAILED (system-dpdk.at:224)
  16: OVS-DPDK - MTU increase vport port              FAILED (system-dpdk.at:609)
  17: OVS-DPDK - MTU decrease vport port              FAILED (system-dpdk.at:651)
  20: OVS-DPDK - MTU upper bound vport port           FAILED (system-dpdk.at:767)
  21: OVS-DPDK - MTU lower bound vport port           FAILED (system-dpdk.at:809)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2067889/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list