There are cases when masakari-hostmonitor will recognize online nodes as offline and send (in)appropriate notifications to Masakari
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Committed
|
High
|
Unassigned | ||
Victoria |
Fix Committed
|
High
|
Unassigned | ||
Wallaby |
Fix Committed
|
High
|
Unassigned | ||
masakari-monitors |
Fix Released
|
High
|
Radosław Piliszek | ||
Ussuri |
Fix Released
|
High
|
Radosław Piliszek | ||
Victoria |
Fix Released
|
High
|
Radosław Piliszek | ||
Wallaby |
Fix Released
|
High
|
Radosław Piliszek | ||
Xena |
Fix Released
|
High
|
Radosław Piliszek | ||
masakari-monitors (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Committed
|
High
|
Unassigned |
Bug Description
[Issue]
ComputeNodes are managed by pacemaker_remote in my environment.
When one ComputeNode is isolated in the network, masakari-
At that time, the isolated masakari-
As a result, masakari-engine runs the recovery procedure to online ComputeNodes.
[Cause]
The current masakari-
masakari-
<https:/
But masakari-
<https:/
[Solution]
The ComputeNode managed by pacemaker_remote should determine recognize itself as offline when it is isolated.
The state monitoring process should be skipped in that case.
See comment #11 for how yoctozepto managed to reproduce something similar to the described.
Changed in masakari-monitors: | |
assignee: | nobody → Daisuke Suzuki (suzuki-di) |
status: | New → In Progress |
Changed in masakari-monitors: | |
assignee: | Daisuke Suzuki (suzuki-di) → Radosław Piliszek (yoctozepto) |
Changed in masakari-monitors (Ubuntu): | |
status: | New → Fix Released |
Changed in masakari-monitors (Ubuntu Focal): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in cloud-archive: | |
status: | New → Fix Released |
_check_ host_status_ by_crmadmin [1] is the proper safeguard.
Hostmonitor should be treated as pacemaker proxy so should run on pacemaker nodes (not remotes).
I guess this needs documenting and disabling its functionality on non-pacemaker nodes altogether.
There is no benefit to running hostmonitors on remotes, it can only result in more resource waste and less stability.
[1] https:/ /opendev. org/openstack/ masakari- monitors/ src/commit/ b02c6b6931c0256 f4ce6d7167c97eb b849ff3453/ masakarimonitor s/hostmonitor/ host_handler/ handle_ host.py# L414-L418