test_network_policies fails on "Reaching out to nginx.netpolicy" with ignore-loose-rpf set to True
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Charmed Kubernetes Testing |
Triaged
|
Medium
|
Unassigned |
Bug Description
As seen here: https:/
---
This issue was caused in the past by not setting Calico ignore-loose-rpf to True. This time we do set it in our bundle here:
https:/
And you can see it is getting set several places in the logs with a message like this:
0/baremetal/
2021-02-25 06:20:10.169 [WARNING][67] int_dataplane.go 1026: Kernel's RPF check is set to 'loose' and IgnoreLooseRPF set to true. Calico will not be able to prevent workloads from spoofing their source IP. Please ensure that some other anti-spoofing mechanism is in place (such as running only non-privileged containers).
Changed in charmed-kubernetes-testing: | |
status: | Triaged → New |
Thanks for the report. Slightly different symptom this time, as the "Reaching out to nginx.netpolicy with restrictions" message is only logged once:
2021-02-25-07:06:50 root DEBUG Reaching out to nginx.netpolicy with no restrictions
2021-02-25-07:07:28 root DEBUG Reaching out to nginx.netpolicy with no restrictions
2021-02-25-07:07:41 root DEBUG Reaching out to nginx.netpolicy with restrictions
2021-02-25-07:36:00 root ERROR [localhost] Command failed: ...
Looks like the test is hanging here: https:/ /github. com/charmed- kubernetes/ jenkins/ blob/9f180e2be0 d209a6b82be93bd a8f9623cd133bf8 /jobs/integrati on/validation. py#L543- L552
At 07:09:03, kubernetes-master/2 acquires a machine lock for action 108 and never releases it:
2021-02-25 07:09:03 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for kubernetes-master/2 uniter (run action 108) uniter. operation executor.go:132 preparing operation "run action 108" for kubernetes-master/2 uniter. operation executor.go:132 executing operation "run action 108" for kubernetes-master/2 uniter. runner runner.go:288 juju-run action is running
2021-02-25 07:09:03 DEBUG juju.worker.
2021-02-25 07:09:03 DEBUG juju.worker.
2021-02-25 07:09:03 DEBUG juju.worker.
That action is definitely holding things up. Unfortunately, I'm not able to find which action that is or what command it ran. That info doesn't appear to be collected in the crashdump.