calico workload status in error due to hook failed: "update-status"

Bug #1990672 reported by Konstantinos Kaskavelis
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Calico Charm
Fix Released
High
George Kraft

Bug Description

Solutions QA team has a failed run where calico charm errors due to update-status hook failing.
The failure happens within the kubernetes-maas layer.

From the logs:

unit-calico-0: 11:36:05 INFO unit.calico/0.juju-log Invoking reactive handler: reactive/calico.py:146:update_calicoctl_env
unit-calico-0: 11:36:05 INFO unit.calico/0.juju-log Invoking reactive handler: reactive/calico.py:172:check_etcd_changes
unit-calico-0: 11:36:05 INFO unit.calico/0.juju-log Invoking reactive handler: reactive/calico.py:484:configure_node
unit-calico-0: 11:36:05 INFO unit.calico/0.juju-log status-set: maintenance: Configuring Calico node
unit-calico-0: 11:36:16 ERROR unit.calico/0.juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-calico-0/charm/reactive/calico.py", line 494, in configure_node
    node = calicoctl_get('node', node_name)
  File "/var/lib/juju/agents/unit-calico-0/charm/reactive/calico.py", line 685, in calicoctl_get
    result = yaml.safe_load(output)
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/composer.py", line 39, in get_single_node
    if not self.check_event(StreamEndEvent):
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/var/lib/juju/agents/unit-calico-0/.venv/lib/python3.8/site-packages/yaml/parser.py", line 171, in parse_document_start
    raise ParserError(None, None,
yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'
  in "<byte string>", line 2, column 1:
    apiVersion: projectcalico.org/v3
    ^

Test run:

https://solutions.qa.canonical.com/testruns/testRun/16612ec9-8d53-44e3-b2a2-d9ad378b5c5c

Logs:

https://oil-jenkins.canonical.com/artifacts/16612ec9-8d53-44e3-b2a2-d9ad378b5c5c/index.html

Revision history for this message
George Kraft (cynerva) wrote :

Interesting. It looks like `calicoctl get node <node-name` returned unexpected output - something in line 1 closed the YAML document. I'm guessing it logged a warning to stderr that got captured[1] and interpreted as a YAML sequence, or something.

Recommended fix:
1. Remove the stderr=STDOUT pipe when calling calicoctl
2. Add debug logging when yaml.parser.ParserError is raised in calicoctl_get so we can see what it failed to parse

[1]: https://github.com/charmed-kubernetes/layer-calico/blob/d6af0785470d3370c3791dc17df51874a87da0db/reactive/calico.py#L619

Changed in charm-calico:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :
Download full text (11.1 KiB)

We have a new occurrence of this one here:

https://solutions.qa.canonical.com/v2/testruns/8147f306-af16-4827-84c4-5c293b99d1c9

Full logs from the run can be found here:

https://oil-jenkins.canonical.com/artifacts/8147f306-af16-4827-84c4-5c293b99d1c9/index.html

From ./7/baremetal/var/log/juju/unit-calico-3.log (in juju-crashdump-kubernetes-aws-2022-11-18-18.59.52.tar.gz)

tracer: hooks phase, 1 handlers queued
tracer: ++ queue handler hooks/relations/etcd/requires.py:24:changed
2022-11-18 18:59:09 INFO unit.calico/3.juju-log server.go:316 etcd:16: Invoking reactive handler: hooks/relations/etcd/requires.py:24:changed
2022-11-18 18:59:09 DEBUG unit.calico/3.juju-log server.go:316 etcd:16: tracer: set flag etcd.connected
2022-11-18 18:59:09 DEBUG unit.calico/3.juju-log server.go:316 etcd:16: tracer: set flag etcd.available
2022-11-18 18:59:09 DEBUG unit.calico/3.juju-log server.go:316 etcd:16: tracer: set flag etcd.tls.available
2022-11-18 18:59:09 DEBUG unit.calico/3.juju-log server.go:316 etcd:16: tracer>
tracer: main dispatch loop, 6 handlers queued
tracer: ++ queue handler hooks/relations/kubernetes-cni/requires.py:49:broken:cni
tracer: ++ queue handler reactive/calico.py:146:update_calicoctl_env
tracer: ++ queue handler reactive/calico.py:161:install_etcd_credentials
tracer: ++ queue handler reactive/calico.py:428:configure_bgp_globals
tracer: ++ queue handler reactive/calico.py:484:configure_node
tracer: ++ queue handler reactive/calico.py:513:configure_bgp_peers
2022-11-18 18:59:09 INFO unit.calico/3.juju-log server.go:316 etcd:16: Invoking reactive handler: reactive/calico.py:146:update_calicoctl_env
2022-11-18 18:59:09 INFO unit.calico/3.juju-log server.go:316 etcd:16: Invoking reactive handler: reactive/calico.py:161:install_etcd_credentials
2022-11-18 18:59:09 DEBUG unit.calico/3.juju-log server.go:316 etcd:16: tracer>
tracer: set flag calico.etcd-credentials.installed
tracer: ++ queue handler reactive/calico.py:308:configure_calico_pool
tracer: -- dequeue handler reactive/calico.py:161:install_etcd_credentials
2022-11-18 18:59:09 INFO unit.calico/3.juju-log server.go:316 etcd:16: Invoking reactive handler: reactive/calico.py:428:configure_bgp_globals
2022-11-18 18:59:09 INFO unit.calico/3.juju-log server.go:316 etcd:16: status-set: maintenance: Configuring BGP globals
2022-11-18 18:59:12 DEBUG juju.worker.uniter.remotestate watcher.go:660 got a relation units change for calico/3 : {16 {map[etcd/2:{1}] map[] []}}
2022-11-18 18:59:18 ERROR unit.calico/3.juju-log server.go:316 etcd:16: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-calico-3/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-calico-3/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-calico-3/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    se...

George Kraft (cynerva)
Changed in charm-calico:
milestone: none → 1.26+ck1
milestone: 1.26+ck1 → 1.26
assignee: nobody → George Kraft (cynerva)
status: Triaged → In Progress
Revision history for this message
George Kraft (cynerva) wrote :
Changed in charm-calico:
status: In Progress → Fix Committed
Adam Dyess (addyess)
Changed in charm-calico:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.