[Bug 1972648] [NEW] Schema upgrade unsuccessful at first attempt, succeds at second attempt

Mon May 9 13:25:35 UTC 2022

Public bug reported:

When upgrading the OVN package on the unit that first was used to create
a DB cluster (i.e. `--db-Xb-cluster-remote-addr` is empty in
/etc/default/ovn-central) the `ovn-ctl` script will perform Schema
upgrade of the cluster.

This sometimes fails in the first attempt, but succeeds in the second
attempt by just restarting the appropriate OVN DB systemd service. This
is with only upgrading the package on the lead unit, leaving the other
nodes available in the cluster.

No trace of the failure is to be found in the logs so we can only guess
what is happening.

Looking at the CTL library code [0] I suspect that the default of using
a 30 second timeout [1] for the entire `ovsdb-client` invocation is
insufficient if the system is slow/busy/has large db/need
compaction/snapshot etc etc.

0: https://github.com/openvswitch/ovs/blob/9dd3031d2e0e9597449e95428320ccaaff7d8b3d/utilities/ovs-lib.in#L490
1: https://github.com/openvswitch/ovs/blob/9dd3031d2e0e9597449e95428320ccaaff7d8b3d/lib/timeval.c#L257-L271

** Affects: ovn (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ovn in Ubuntu.
https://bugs.launchpad.net/bugs/1972648

Title:
  Schema upgrade unsuccessful at first attempt, succeds at second
  attempt

Status in ovn package in Ubuntu:
  New

Bug description:
  When upgrading the OVN package on the unit that first was used to
  create a DB cluster (i.e. `--db-Xb-cluster-remote-addr` is empty in
  /etc/default/ovn-central) the `ovn-ctl` script will perform Schema
  upgrade of the cluster.

  This sometimes fails in the first attempt, but succeeds in the second
  attempt by just restarting the appropriate OVN DB systemd service.
  This is with only upgrading the package on the lead unit, leaving the
  other nodes available in the cluster.

  No trace of the failure is to be found in the logs so we can only
  guess what is happening.

  Looking at the CTL library code [0] I suspect that the default of
  using a 30 second timeout [1] for the entire `ovsdb-client` invocation
  is insufficient if the system is slow/busy/has large db/need
  compaction/snapshot etc etc.

  0: https://github.com/openvswitch/ovs/blob/9dd3031d2e0e9597449e95428320ccaaff7d8b3d/utilities/ovs-lib.in#L490
  1: https://github.com/openvswitch/ovs/blob/9dd3031d2e0e9597449e95428320ccaaff7d8b3d/lib/timeval.c#L257-L271

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1972648/+subscriptions