When deploying on metal with MAAS, MAAS will add the FQDN to the localhost record in /etc/hosts so that issuing the `hostname -f` command will always succeed regardless of availability of the network.
When deploying on the other provider combinations it is Juju that does the host initialization and Juju does not add the FQDN to the localhost record in /etc/hosts.
[Original description]
On a juju 2.7.8, latest charms (20.08), I have a dead ovn-controller agent on one of the octavia units.
$ openstack network agent list|grep lxd
| juju-a9d6f4-21-lxd-9.maas | OVN Controller agent | juju-a9d6f4-21-lxd-9 | | XXX | UP | ovn-controller |
| juju-a9d6f4-25-lxd-10.maas | OVN Controller agent | juju-a9d6f4-25-lxd-10.maas | | :-) | UP | ovn-controller |
| juju-a9d6f4-23-lxd-10.maas | OVN Controller agent | juju-a9d6f4-23-lxd-10.maas | | :-) | UP | ovn-controller |
Two of the three ovn-controller agents on octavia units are registered with host=$fqdn, and the down controller is registered with a shortname.
`hostname -f` shows the full fqdn on the down unit
/etc/openvswitch/system-id.conf lists the short hostname only
`ovs-vsctl list open_vswitch` lists both the hostname and the system-id as shortname
seeing a lot of errors in /var/log/ovn/ovn-controller.log along the lines of:
2020-09-22T14:22:39.500Z|04678|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9.
2020-09-22T06:25:01.829Z|857112|main|INFO|OVNSB commit failed, force recompute next time.
restart of ovn-controller shows the following in the log:
2020-09-22T14:22:30.498Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log
2020-09-22T14:22:30.500Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-09-22T14:22:30.500Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-09-22T14:22:30.502Z|00004|main|INFO|OVS IDL reconnected, force recompute.
2020-09-22T14:22:30.504Z|00005|reconnect|INFO|ssl:10.35.61.157:6642: connecting...
2020-09-22T14:22:30.504Z|00006|main|INFO|OVNSB IDL reconnected, force recompute.
2020-09-22T14:22:30.508Z|00007|reconnect|INFO|ssl:10.35.61.157:6642: connected
2020-09-22T14:22:30.514Z|00008|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2020-09-22T14:22:30.514Z|00009|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2020-09-22T14:22:30.514Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
2020-09-22T14:22:30.515Z|00011|ovsdb_idl|WARN|transaction error: {"details":"RBAC rules for client \"juju-a9d6f4-21-lxd-9\" role \"ovn-controller\" prohibit modification of table \"Chassis\".","error":"permission error"}
2020-09-22T14:22:30.515Z|00012|main|INFO|OVNSB commit failed, force recompute next time.
2020-09-22T14:22:30.515Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2020-09-22T14:22:30.515Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2020-09-22T14:22:30.516Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 86556077-6325-4cb6-9bbd-c5979ae15d2c, was inserted by this transaction. Second row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2020-09-22T14:22:30.516Z|00014|main|INFO|OVNSB commit failed, force recompute next time.
2020-09-22T14:22:30.516Z|00015|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 916635aa-e98c-4f23-8ac8-1e3f381151c6, was inserted by this transaction.","error":"constraint violation"}
2020-09-22T14:22:30.516Z|00016|main|INFO|OVNSB commit failed, force recompute next time.
2020-09-22T14:22:30.516Z|00017|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9.
2020-09-22T14:22:30.516Z|00018|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066
2020-09-22T14:22:30.517Z|00019|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 6219b9c9-fc57-4caa-8f75-46ead7584901, was inserted by this transaction.","error":"constraint violation"}
2020-09-22T14:22:30.517Z|00020|main|INFO|OVNSB commit failed, force recompute next time.
2020-09-22T14:22:30.518Z|00021|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9.
2020-09-22T14:22:30.518Z|00022|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066
2020-09-22T14:22:30.521Z|00023|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 5f2ca07b-859f-4013-9e49-5fd00a1909e9, was inserted by this transaction.","error":"constraint violation"}
2020-09-22T14:22:30.521Z|00024|main|INFO|OVNSB commit failed, force recompute next time.
2020-09-22T14:22:30.521Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
Relation info being provided from octavia-ovn-chassis to octavia on that unit shows chassis-name as the short hostname, but on other octavia units, the chassis-name provided from ovn-chassis to octavia is the fqdn.
$ sudo juju-run octavia/0 -r 139 --remote-unit octavia-ovn-chassis/1 'relation-get'
chassis-name: '"juju-a9d6f4-21-lxd-9"'
egress-subnets: 10.35.61.179/32
ingress-address: 10.35.61.179
ovn-configured: "true"
private-address: 10.35.61.179
$ sudo juju-run octavia/1 -r 139 --remote-unit octavia-ovn-chassis/2 'relation-get'
chassis-name: '"juju-a9d6f4-23-lxd-10.maas"'
egress-subnets: 10.35.61.191/32
ingress-address: 10.35.61.191
ovn-configured: "true"
private-address: 10.35.61.191
It appears from a brief read-through of the ovn-chassis charm that the hostname is queried from the ovsdb and then system-id is set from that hostname. Is it possible that there's a race between the system being able to query it's fqdn from DNS during deployment and the hostname ovs sees when it initializes the database on install?
Some potentially relevant code snippets:
# The local ``ovn-controller`` process will retrieve information about
# how to connect to OVN from the local Open vSwitch database.
self.run('ovs-vsctl',
'set', 'open', '.',
'external-ids:ovn-encap-type=geneve', '--',
'set', 'open', '.',
'external-ids:ovn-encap-ip={}'
.format(self.get_data_ip()), '--',
'set', 'open', '.',
'external-ids:system-id={}'
.format(self.get_ovs_hostname()))
*snip*
def get_ovs_hostname():
for row in ch_ovsdb.SimpleOVSDB('ovs-vsctl').open_vswitch:
return row['external_ids']['hostname']
assigning field-critical. This is blocking go-live for a Bootstack customer.
Thinking about re-deploying the octavia node and seeing if that clears the issue.
Frode confirmed that the hostname originally comes from hostname -f run by the ovs startup script, and this is likely a race condition at unit deployment time.