hostmonitor hangs after notifications send failed
Bug #1930361 reported by
suzhengwei
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
masakari-monitors |
Fix Released
|
Critical
|
suzhengwei | ||
Ussuri |
Fix Committed
|
Critical
|
Unassigned | ||
Victoria |
Fix Committed
|
Critical
|
Unassigned | ||
Wallaby |
Fix Committed
|
Critical
|
Unassigned | ||
Xena |
Fix Released
|
Critical
|
suzhengwei | ||
masakari-monitors (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
In an env, we found one hostmonitor didn't log anymore after send host failure notification failed.
I noticed that in the monitor_hosts it will exit if once it catch some exception. So there is risk, that if one host down later, no recovery will be triggered.
See comment #5 for a detailed analysis.
description: | updated |
Changed in masakari-monitors: | |
assignee: | nobody → suzhengwei (sue.sam) |
importance: | Medium → Critical |
description: | updated |
To post a comment you must log in.
In Kolla Ansible we workaround such issues by ensuring the container is set to restart automatically on failures.
I agree it should be circumvented at the process level as well.