hostmonitor hangs after notifications send failed
Bug #1930361 reported by
suzhengwei
This bug affects 1 person
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| masakari-monitors |
Fix Released
|
Critical
|
suzhengwei | ||
| Ussuri |
Fix Committed
|
Critical
|
Unassigned | ||
| Victoria |
Fix Committed
|
Critical
|
Unassigned | ||
| Wallaby |
Fix Committed
|
Critical
|
Unassigned | ||
| Xena |
Fix Released
|
Critical
|
suzhengwei | ||
| masakari-monitors (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Bug Description
In an env, we found one hostmonitor didn't log anymore after send host failure notification failed.
I noticed that in the monitor_hosts it will exit if once it catch some exception. So there is risk, that if one host down later, no recovery will be triggered.
See comment #5 for a detailed analysis.
| description: | updated |
| Changed in masakari-monitors: | |
| assignee: | nobody → suzhengwei (sue.sam) |
| importance: | Medium → Critical |
| description: | updated |
To post a comment you must log in.
In Kolla Ansible we workaround such issues by ensuring the container is set to restart automatically on failures.
I agree it should be circumvented at the process level as well.