Two controllers are stopped after adding them into an environment

Bug #1575039 reported by guillaume thouvenin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StackLight
Fix Released
High
guillaume thouvenin

Bug Description

When adding 2 controller nodes, on an environment we observed that only one collector is reported as started after the deployment. Others are stopped. In the log of pacemaker we see:

<28>Apr 25 12:21:06 node-140 crmd[8132]: warning: Cannot execute '/usr/lib/ocf/resource.d/fuel/ocf-lma_collector': No such file or directory (2)
<27>Apr 25 12:21:06 node-140 crmd[8132]: error: Failed to retrieve meta-data for ocf:fuel:ocf-lma_collector

We think that pacemaker is trying to start the resource lma_collector before all nodes are deployed.

Tags: mos9
Swann Croiset (swann-w)
description: updated
tags: added: mos9
Changed in lma-toolchain:
status: New → Incomplete
Revision history for this message
guillaume thouvenin (guillaume-thouvenin) wrote :

I just reproduced the bug. Here is the steps I did:

- Deploy an environment with 1 controller + 1 compute + 1 LMA
- Add two new controllers

Expected result: We see metrics from new controllers in grafana

Observed result: metric_collector and log_collector are not started on the two new controllers.

Here is the output of crm status:

 Clone Set: clone_metric_collector [metric_collector]
     Started: [ node-34.test.domain.local ]
     Stopped: [ node-32.test.domain.local node-33.test.domain.local ]
 Clone Set: clone_log_collector [log_collector]
     Started: [ node-34.test.domain.local ]
     Stopped: [ node-32.test.domain.local node-33.test.domain.local ]

The problem occurs because scripts used by pacemaker are not installed on the new controllers when pacemaker restart the resource. The deployment of the OCF scripts is done during the post install that is too late.

Workaround: restart resource after cleannup

crm resource cleanup metric_collector
crm resource cleanup log_collector

Changed in lma-toolchain:
status: Incomplete → Confirmed
milestone: 0.10.0 → 1.0.0
importance: Undecided → High
summary: - two controllers are stopped with task based deployment
+ Two controllers are stopped after adding them into an environment
description: updated
Changed in lma-toolchain:
assignee: LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369395

Changed in lma-toolchain:
status: Confirmed → In Progress
Swann Croiset (swann-w)
Changed in lma-toolchain:
milestone: 1.0.0 → 0.10.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/369395
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=8bc835c74a51b85da03b4719ea35540908418277
Submitter: Jenkins
Branch: master

commit 8bc835c74a51b85da03b4719ea35540908418277
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 13:53:04 2016 +0200

    Fix issue when installing the OCF script

    This patch moves the installation of the OCF script at the beginning of
    the depoy_start to be sure that it is available when pacemaker starts
    the collector resources. As it requires a configured hiera we also moved
    the hiera task.

    Change-Id: I90b4fa2a9038eaed0f1dcadb0f00713a1b2487b0
    Closes-bug: #1575039

Changed in lma-toolchain:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (stable/0.10)

Fix proposed to branch: stable/0.10
Review: https://review.openstack.org/375267

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (stable/0.10)

Reviewed: https://review.openstack.org/375267
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=a3fd22494a7592a158bc2210c33c0f40639053eb
Submitter: Jenkins
Branch: stable/0.10

commit a3fd22494a7592a158bc2210c33c0f40639053eb
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 13:53:04 2016 +0200

    Fix issue when installing the OCF script

    This patch moves the installation of the OCF script at the beginning of
    the depoy_start to be sure that it is available when pacemaker starts
    the collector resources. As it requires a configured hiera we also moved
    the hiera task.

    Change-Id: I90b4fa2a9038eaed0f1dcadb0f00713a1b2487b0
    Closes-bug: #1575039
    (cherry picked from commit 8bc835c74a51b85da03b4719ea35540908418277)

Changed in lma-toolchain:
status: Fix Committed → Won't Fix
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.