CMR with Openstack Floating IP Addresses reporting wrong ingress addresses

Bug #1871441 reported by Jeff Hillman
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Etcd Charm
Triaged
Medium
Unassigned

Bug Description

Kubernetes 1.17.4
Juju 2.7.5
Openstack Bionic-Train
MAAS 2.6.2

Testing CMR with a kubernetes control plane (etcd, easyrsa, kubernetes-master, kubeapi-load-balancer, openstack-integrator) in Openstack.

Kubernetes workers in both Openstack and Bare-metal (provided by MAAS).

Seperate controllers (1 in openstack, 1 in MAAS)

in each model, there are independent containerd and flannel services.

Bare-metal model only has k8s-worker, containerd and flannel.

Using Floating IP in openstack, and the bare-metal env and the openstack FIP are the same subnet/VLAN. There is verified connectivity between the two envs.

When oferring/relating etcd from control in openstack to flannel on bare-metal, flannel gets stuck at waiting on etcd.

Looking at flannel logs we see:

---

2020-04-07 14:20:05 INFO juju-log etcd:3: Invoking reactive handler: reactive/flannel.py:145:etcd_changed
2020-04-07 14:20:05 INFO juju-log etcd:3: Invoking reactive handler: reactive/flannel.py:156:invoke_configure_network
2020-04-07 14:20:23 DEBUG etcd-relation-changed Error: dial tcp 192.168.58.145:2379: i/o timeout
2020-04-07 14:20:24 INFO juju-log etcd:3: Unexpected error configuring network. Assuming etcd not ready. Will retry in 20s

---

Performing network-get db against the etcd service we see:

---

$ juju run --unit etcd/0 -- "network-get db"
bind-addresses:
- macaddress: fa:16:3e:0a:40:ee
  interfacename: ens2
  addresses:
  - hostname: ""
    address: 192.168.58.145
    cidr: 192.168.58.0/24
  - hostname: ""
    address: 192.168.58.145
    cidr: 192.168.58.0/24
- macaddress: 96:d7:c7:ce:62:49
  interfacename: fan-252
  addresses:
  - hostname: ""
    address: 252.145.0.1
    cidr: 252.0.0.0/8
egress-subnets:
- 192.168.58.145/32
ingress-addresses:
- 192.168.58.145
- 192.168.58.145
- 252.145.0.1

---

The 192.168.58.0/24 addresses are from the internal (vxlan) network of openstack. Etcd should be aware of the floating IP addresses and advertise them. Obviously flannel will never get to that private network.

This could be the offender here:

https://github.com/charmed-kubernetes/layer-etcd/blob/3ac70e3f881cb19a4fff399fc0c33777a824f31b/reactive/etcd.py#L222

### juju status from control plane in openstack

$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
default openstack-regionone openstack/RegionOne 2.7.5 unsupported 12:16:15-04:00

App Version Status Scale Charm Store Rev OS Notes
containerd active 3 containerd jujucharms 61 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 296 ubuntu
etcd 3.3.15 active 3 etcd jujucharms 496 ubuntu
flannel 0.11.0 active 3 flannel jujucharms 468 ubuntu
kubeapi-load-balancer 1.14.0 active 1 kubeapi-load-balancer jujucharms 682 ubuntu exposed
kubernetes-master 1.17.4 active 1 kubernetes-master jujucharms 808 ubuntu
kubernetes-worker-os 1.17.4 active 2 kubernetes-worker jujucharms 634 ubuntu exposed
openstack-integrator train active 1 openstack-integrator jujucharms 49 ubuntu

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 172.16.7.179 Certificate Authority connected.
etcd/0* active idle 1 172.16.7.180 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 172.16.7.185 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 172.16.7.178 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 172.16.7.181 443/tcp Loadbalancer ready.
kubernetes-master/0* active idle 5 172.16.7.186 6443/tcp Kubernetes master running.
  containerd/1 active idle 172.16.7.186 Container runtime available
  flannel/1 active idle 172.16.7.186 Flannel subnet 10.1.48.1/24
kubernetes-worker-os/0 active executing 6 172.16.7.182 (juju-run) Kubernetes worker running.
  containerd/2 active idle 172.16.7.182 Container runtime available
  flannel/2 active idle 172.16.7.182 Flannel subnet 10.1.79.1/24
kubernetes-worker-os/1* active executing 7 172.16.7.188 (juju-run) Kubernetes worker running.
  containerd/0* active idle 172.16.7.188 Container runtime available
  flannel/0* active idle 172.16.7.188 Flannel subnet 10.1.7.1/24
openstack-integrator/0* active idle 8 172.16.7.183 Ready

Machine State DNS Inst id Series AZ Message
0 started 172.16.7.179 273386c0-82f0-4982-839d-c37ab50a2e18 bionic nova ACTIVE
1 started 172.16.7.180 8f1448dd-0734-4c86-940c-5fb00a434430 bionic nova ACTIVE
2 started 172.16.7.185 90b57213-dda6-4e5e-9c94-27e827fa3145 bionic nova ACTIVE
3 started 172.16.7.178 a637577c-e9c1-4757-87d0-106f1eccfedf bionic nova ACTIVE
4 started 172.16.7.181 f903f10b-326f-4a31-8320-1ba24e5a1ae4 bionic nova ACTIVE
5 started 172.16.7.186 fb740de7-1383-464d-a235-339130ba2f6a bionic nova ACTIVE
6 started 172.16.7.182 fd51f143-3eb6-40bd-937f-af2d7087a610 bionic nova ACTIVE
7 started 172.16.7.188 9ec4b552-a62d-45bf-925b-87006cc91e5f bionic nova ACTIVE
8 started 172.16.7.183 c255083b-d6e2-4b23-b1e5-87cab58b7177 bionic nova ACTIVE

Offer Application Charm Rev Connected Endpoint Interface Role
easyrsa easyrsa easyrsa 296 1/1 client tls-certificates provider
etcd etcd etcd 496 1/1 db etcd provider
kubeapi-load-balancer kubeapi-load-balancer kubeapi-load-balancer 682 1/1 website http provider
kubernetes-master kubernetes-master kubernetes-master 808 1/1 kube-control kube-control provider
kubernetes-worker-os kubernetes-worker-os kubernetes-worker 634 0/0 coordinator coordinator peer

Relation provider Requirer Interface Type Message
easyrsa:client etcd:certificates tls-certificates regular
easyrsa:client kubeapi-load-balancer:certificates tls-certificates regular
easyrsa:client kubernetes-master:certificates tls-certificates regular
easyrsa:client kubernetes-worker-os:certificates tls-certificates regular
etcd:cluster etcd:cluster etcd peer
etcd:db flannel:etcd etcd regular
etcd:db kubernetes-master:etcd etcd regular
kubeapi-load-balancer:loadbalancer kubernetes-master:loadbalancer public-address regular
kubeapi-load-balancer:website kubernetes-worker-os:kube-api-endpoint http regular
kubernetes-master:cni flannel:cni kubernetes-cni subordinate
kubernetes-master:container-runtime containerd:containerd container-runtime subordinate
kubernetes-master:coordinator kubernetes-master:coordinator coordinator peer
kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver http regular
kubernetes-master:kube-control kubernetes-worker-os:kube-control kube-control regular
kubernetes-master:kube-masters kubernetes-master:kube-masters kube-masters peer
kubernetes-worker-os:cni flannel:cni kubernetes-cni subordinate
kubernetes-worker-os:container-runtime containerd:containerd container-runtime subordinate
kubernetes-worker-os:coordinator kubernetes-worker-os:coordinator coordinator peer
openstack-integrator:clients kubernetes-master:openstack openstack-integration regular
openstack-integrator:clients kubernetes-worker-os:openstack openstack-integration regular

#### juju offers from control plane

$ juju offers
Offer User Relation id Status Endpoint Interface Role Ingress subnets
easyrsa admin 21 joined client tls-certificates provider 172.16.7.0/24
etcd admin 20 joined db etcd provider 172.16.7.0/24
kubeapi-load-balancer admin 23 joined website http provider 172.16.7.0/24
kubernetes-master admin 22 joined kube-control kube-control provider 172.16.7.0/24
kubernetes-worker-os -

#### juju status from bare-metal model

$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
k8s-worker jhillman-maas jhillman-maas 2.7.5 unsupported 12:17:27-04:00

SAAS Status Store URL
easyrsa active openstack-regionone admin/default.easyrsa
etcd active openstack-regionone admin/default.etcd
kubeapi-load-balancer active openstack-regionone admin/default.kubeapi-load-balancer
kubernetes-master active openstack-regionone admin/default.kubernetes-master
kubernetes-worker-os active openstack-regionone admin/default.kubernetes-worker-os

App Version Status Scale Charm Store Rev OS Notes
containerd active 1 containerd jujucharms 61 ubuntu
flannel 0.11.0 maintenance 1 flannel jujucharms 468 ubuntu
kubernetes-worker-bm 1.17.4 waiting 1 kubernetes-worker jujucharms 634 ubuntu

Unit Workload Agent Machine Public address Ports Message
kubernetes-worker-bm/0* waiting idle 0 10.0.22.5 Waiting for kubelet,kube-proxy to start.
  containerd/0* active idle 10.0.22.5 Container runtime available
  flannel/0* maintenance idle 10.0.22.5 Negotiating flannel network subnet.

Machine State DNS Inst id Series AZ Message
0 started 10.0.22.5 agrippa bionic default Deployed

Relation provider Requirer Interface Type Message
easyrsa:client kubernetes-worker-bm:certificates tls-certificates regular
etcd:db flannel:etcd etcd regular
kubeapi-load-balancer:website kubernetes-worker-bm:kube-api-endpoint http regular
kubernetes-master:kube-control kubernetes-worker-bm:kube-control kube-control regular
kubernetes-worker-bm:cni flannel:cni kubernetes-cni subordinate
kubernetes-worker-bm:container-runtime containerd:containerd container-runtime subordinate
kubernetes-worker-bm:coordinator kubernetes-worker-bm:coordinator coordinator peer

George Kraft (cynerva)
Changed in charm-etcd:
status: New → Confirmed
Revision history for this message
George Kraft (cynerva) wrote :

Confirmed. The etcd charm, when calling network-get for the db relation, does not pass in relation IDs. This means it does not fully support cross-model relations.

This won't be easy to fix. The etcd charm will need to call network-get with relation IDs for the db relation. The charm might observe different ingress addresses for each relation, so it will need to be able to send a different address to each relation. The etcd interface doesn't easily support this - it's still using the old RelationBase with scope=GLOBAL, meaning that the etcd units need to communicate with eachother via the cluster relation and agree on a single connection string before sending that off to db relations.

Most likely, fixing this will involve first updating interface-etcd to use the Endpoint class instead of RelationBase.

George Kraft (cynerva)
Changed in charm-etcd:
importance: Undecided → High
status: Confirmed → Triaged
George Kraft (cynerva)
Changed in charm-etcd:
importance: High → Medium
Changed in charm-etcd:
milestone: none → 1.25
Adam Dyess (addyess)
Changed in charm-etcd:
milestone: 1.25 → 1.26
George Kraft (cynerva)
Changed in charm-etcd:
milestone: 1.26 → 1.26+ck1
Adam Dyess (addyess)
Changed in charm-etcd:
milestone: 1.26+ck1 → 1.26+ck2
Changed in charm-etcd:
milestone: 1.26+ck2 → 1.27
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

RelationBase -> Endpoint didn't make it for 1.27 and the etcd charm is ripe for converting to ops for 1.28. I suspect this fix will come for free with that conversion; apologies, but we're re-targeting again for 1.28.

Changed in charm-etcd:
milestone: 1.27 → 1.28
Adam Dyess (addyess)
Changed in charm-etcd:
milestone: 1.28 → 1.28+ck1
Adam Dyess (addyess)
Changed in charm-etcd:
milestone: 1.28+ck1 → 1.29
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.