Two controllers are stopped after adding them into an environment
Bug #1575039 reported by
guillaume thouvenin
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StackLight |
Fix Released
|
High
|
guillaume thouvenin |
Bug Description
When adding 2 controller nodes, on an environment we observed that only one collector is reported as started after the deployment. Others are stopped. In the log of pacemaker we see:
<28>Apr 25 12:21:06 node-140 crmd[8132]: warning: Cannot execute '/usr/lib/
<27>Apr 25 12:21:06 node-140 crmd[8132]: error: Failed to retrieve meta-data for ocf:fuel:
We think that pacemaker is trying to start the resource lma_collector before all nodes are deployed.
description: | updated |
tags: | added: mos9 |
Changed in lma-toolchain: | |
status: | New → Incomplete |
Changed in lma-toolchain: | |
assignee: | LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin) |
Changed in lma-toolchain: | |
milestone: | 1.0.0 → 0.10.2 |
Changed in lma-toolchain: | |
status: | Fix Committed → Won't Fix |
status: | Won't Fix → Fix Released |
To post a comment you must log in.
I just reproduced the bug. Here is the steps I did:
- Deploy an environment with 1 controller + 1 compute + 1 LMA
- Add two new controllers
Expected result: We see metrics from new controllers in grafana
Observed result: metric_collector and log_collector are not started on the two new controllers.
Here is the output of crm status:
Clone Set: clone_metric_ collector [metric_collector] test.domain. local ] test.domain. local node-33. test.domain. local ] test.domain. local ] test.domain. local node-33. test.domain. local ]
Started: [ node-34.
Stopped: [ node-32.
Clone Set: clone_log_collector [log_collector]
Started: [ node-34.
Stopped: [ node-32.
The problem occurs because scripts used by pacemaker are not installed on the new controllers when pacemaker restart the resource. The deployment of the OCF scripts is done during the post install that is too late.
Workaround: restart resource after cleannup
crm resource cleanup metric_collector
crm resource cleanup log_collector