Upgrade from OVN 20.03 to newer OVN version will cause data plane outage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Wallaby |
Triaged
|
High
|
Frode Nordahl | ||
charm-layer-ovn |
Fix Released
|
High
|
Frode Nordahl | ||
charm-ovn-chassis |
Fix Released
|
High
|
Frode Nordahl | ||
20.03 |
Fix Released
|
Undecided
|
Unassigned | ||
20.12 |
Fix Released
|
Undecided
|
Unassigned | ||
21.09 |
Fix Released
|
Undecided
|
Unassigned | ||
charm-ovn-dedicated-chassis |
Fix Released
|
High
|
Frode Nordahl | ||
20.03 |
Fix Released
|
Undecided
|
Unassigned | ||
20.12 |
Fix Released
|
Undecided
|
Unassigned | ||
21.09 |
Fix Released
|
Undecided
|
Unassigned | ||
ovn (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Frode Nordahl | ||
Hirsute |
Won't Fix
|
Undecided
|
Frode Nordahl | ||
Impish |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
When upgrading from OVN 20.03, as made available in Ubuntu Focal, to a newer version of OVN, it is currently not possible to upgrade without causing a data plane outage.
If the user attempts to upgrade the central components first, the ovn-controller will tear down connectivity to running instances as it may not fully understand the data structure of a newer database.
If the user attempts to upgrade the ovn-controler first, recent releases are not guaranteed to understand the older database and connectivity may remain down until all hypervisors and central components have been upgraded.
If the user attempts to manually stop the ovn-controller during the upgrade to avoid it inadvertently tearing down connectivity on central component upgrade, cloud instances will be deprived of vital services such as DNS lookup and DHCP.
To fix this situation two changes are needed:
1) Backport of a upstream feature [0] that allows the ovn-controller to detect version mismatch and subsequently refrain from making further changes to the local Open vSwitch instance until the version mismatch is corrected.
2) Make ovn-controller not clear out runtime flow state in Open vSwitch on exit by updating the ovn-controller systemd service to pass the `--restart` argument when stopping the controller. This flag tells the ovn-controller process that it should not clear out Open vSwitch flows and OVN SB database records on exit, which allows already installed state to continue operation until the new instance of the ovn-controller process starts. [1][2][3]
It does not mean that the service will be restarted as opposed to being stopped, as one might think based on the name of the argument.
This change serves two purposes:
2a) Allow upgrading the ovn-controller to a newer version than the central components, while retaining connectivity to running instances until the central components are upgraded.
2b) Minimize the downtime on package upgrade.
[Test Plan]
1. Deploy OpenStack Ussuri from the Focal archive.
2. Launch and instance and confirm connectivity.
3. Add UCA or other PPA with a newer version of OVN and perform upgrade of the OVN components on relevant units in the deployment.
4. Confirm how new version of central components make the ovn-controller log version mismatch as well as show continued connectivity to the test instance.
5. Upgrade data plane units and confirm how the version mismatch situation is resolved and at the same time instances retain connectivity with minimal downtime during the upgrade.
[Regression Potential]
The backported feature is optional and enabled by specifically entering a key-value pair into the local Open vSwitch database to enable it. It has also been available upstream for several releases.
The change to the ovn-controller systemd service has been in Ubuntu since Impish [3] and we have had no reports of side effects of this change.
[Original Bug Description]
The upstream recommendation for upgrades of OVN is to first upgrade the data plane components (chassis aka. ovn-controller), and then upgrade the central components (the database schema and ovn-northd). The rationale for this is that the new version of the ovn-controller is required to cope with any changes to database schema or how northd programs flows.
However, during the course of rapid OVN development there has also been introduced changes that make the new ovn-controller not cope with a old database schema, breaking the recommended upgrade procedure.
To cope with this upstream has introduced a new optional configuration for the ovn-controller that allows it to detect version inconsistencies, and when they are present stop it from making changes to the data plane until the version inconsistency is resolved [0].
For the above mentioned configuration to be effective we also need the package to call ``ovn-ctl stop_controller`` with the --restart option so that the ovn-controller does not flush the installed flows on exit.
We should make required changes to packages and charms to allow upgrades to progress with less data plane outage.
0: https:/
1: https:/
2: https:/
3: https:/
Related branches
- James Page: Pending requested
-
Diff: 1001 lines (+943/-2)7 files modifieddebian/changelog (+13/-0)
debian/control (+1/-1)
debian/ovn-host.ovn-controller.service (+1/-1)
debian/patches/lp-1940043-0001-Provide-the-option-to-pin-ovn-controller-and-ovn-nor.patch (+456/-0)
debian/patches/lp-1940043-0002-controller-Allow-pinctrl-thread-to-handle-packet-ins.patch (+220/-0)
debian/patches/lp-1980213-treewide-bump-ovs-and-fix-problematic-loops.patch (+249/-0)
debian/patches/series (+3/-0)
- James Page: Approve
-
Diff: 29 lines (+9/-1)2 files modifieddebian/changelog (+8/-0)
debian/ovn-host.ovn-controller.service (+1/-1)
Changed in charm-ovn-chassis: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Frode Nordahl (fnordahl) |
description: | updated |
Changed in ovn (Ubuntu Impish): | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Frode Nordahl (fnordahl) |
Changed in charm-layer-ovn: | |
status: | New → Incomplete |
status: | Incomplete → In Progress |
importance: | Undecided → High |
assignee: | nobody → Frode Nordahl (fnordahl) |
Changed in charm-ovn-dedicated-chassis: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Frode Nordahl (fnordahl) |
status: | In Progress → Triaged |
Changed in charm-ovn-chassis: | |
status: | In Progress → Triaged |
Changed in charm-layer-ovn: | |
status: | In Progress → Fix Committed |
milestone: | none → 21.10 |
Changed in ovn (Ubuntu Impish): | |
status: | In Progress → Fix Committed |
assignee: | Frode Nordahl (fnordahl) → nobody |
Changed in ovn (Ubuntu Hirsute): | |
assignee: | nobody → Frode Nordahl (fnordahl) |
Changed in ovn (Ubuntu Focal): | |
assignee: | nobody → Frode Nordahl (fnordahl) |
Changed in charm-ovn-chassis: | |
milestone: | none → 21.10 |
Changed in charm-ovn-dedicated-chassis: | |
milestone: | none → 21.10 |
Changed in charm-ovn-chassis: | |
status: | Fix Committed → Fix Released |
Changed in charm-layer-ovn: | |
status: | Fix Committed → Fix Released |
Changed in charm-ovn-dedicated-chassis: | |
status: | Fix Committed → Fix Released |
Changed in cloud-archive: | |
status: | New → Fix Released |
Changed in ovn (Ubuntu Focal): | |
status: | New → Triaged |
description: | updated |
Changed in ovn (Ubuntu Focal): | |
importance: | Undecided → High |
description: | updated |
Changed in cloud-archive: | |
status: | Fix Released → Fix Committed |
https:/ /github. com/openstack- charmers/ charm-layer- ovn/pull/ 51