[LMA] Pacemaker fails to restart the collector when the watchdog check fails

Bug #1514893 reported by Simon Pasquier
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StackLight
Fix Released
High
Simon Pasquier

Bug Description

Pacemaker monitors /tmp/lma_collector.watchdog to verify that the Heka pipeline isn't blocked.

Unfortunately the implementation is buggy on at least 2 points:
- Pacemaker doesn't stop the running Heka process when it detects that the watchdog file is too old.
- The /tmp/lma_collector.watchdog file isn't removed when the Heka process is stopped. So Pacemaker may consider that the new Heka process is down relying on the old watchdog file.

Tags: lma
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :
Changed in lma-toolchain:
status: In Progress → Fix Committed
Changed in lma-toolchain:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.