StackLight

Two controllers are stopped after adding them into an environment

Bug #1575039 reported by guillaume thouvenin on 2016-04-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StackLight	Fix Released	High	guillaume thouvenin	StackLight 0.10.2

Bug Description

When adding 2 controller nodes, on an environment we observed that only one collector is reported as started after the deployment. Others are stopped. In the log of pacemaker we see:

<28>Apr 25 12:21:06 node-140 crmd[8132]: warning: Cannot execute '/usr/lib/ocf/resource.d/fuel/ocf-lma_collector': No such file or directory (2)
<27>Apr 25 12:21:06 node-140 crmd[8132]: error: Failed to retrieve meta-data for ocf:fuel:ocf-lma_collector

We think that pacemaker is trying to start the resource lma_collector before all nodes are deployed.

See original description

Tags:

Swann Croiset (swann-w) on 2016-05-23

description:

updated

Simon Pasquier (simon-pasquier) on 2016-05-23

tags:	added: mos9
Changed in lma-toolchain:
status:	New → Incomplete

Revision history for this message

guillaume thouvenin (guillaume-thouvenin) wrote on 2016-09-13:

I just reproduced the bug. Here is the steps I did:

- Deploy an environment with 1 controller + 1 compute + 1 LMA
- Add two new controllers

Expected result: We see metrics from new controllers in grafana

Observed result: metric_collector and log_collector are not started on the two new controllers.

Here is the output of crm status:

Clone Set: clone_metric_collector [metric_collector]
     Started: [ node-34.test.domain.local ]
     Stopped: [ node-32.test.domain.local node-33.test.domain.local ]
Clone Set: clone_log_collector [log_collector]
     Started: [ node-34.test.domain.local ]
     Stopped: [ node-32.test.domain.local node-33.test.domain.local ]

The problem occurs because scripts used by pacemaker are not installed on the new controllers when pacemaker restart the resource. The deployment of the OCF scripts is done during the post install that is too late.

Workaround: restart resource after cleannup

crm resource cleanup metric_collector
crm resource cleanup log_collector

Changed in lma-toolchain:
status:	Incomplete → Confirmed
milestone:	0.10.0 → 1.0.0
importance:	Undecided → High
summary:	- two controllers are stopped with task based deployment + Two controllers are stopped after adding them into an environment
description:	updated

guillaume thouvenin (guillaume-thouvenin) on 2016-09-13

Changed in lma-toolchain:
assignee:	LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-13: Fix proposed to fuel-plugin-lma-collector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369395

Changed in lma-toolchain:
status:	Confirmed → In Progress

Swann Croiset (swann-w) on 2016-09-21

Changed in lma-toolchain:
milestone:	1.0.0 → 0.10.2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-23: Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/369395
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=8bc835c74a51b85da03b4719ea35540908418277
Submitter: Jenkins
Branch: master

commit 8bc835c74a51b85da03b4719ea35540908418277
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 13:53:04 2016 +0200

Fix issue when installing the OCF script

    This patch moves the installation of the OCF script at the beginning of
    the depoy_start to be sure that it is available when pacemaker starts
    the collector resources. As it requires a configured hiera we also moved
    the hiera task.

Change-Id: I90b4fa2a9038eaed0f1dcadb0f00713a1b2487b0
Closes-bug: #1575039

Changed in lma-toolchain:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-23: Fix proposed to fuel-plugin-lma-collector (stable/0.10)

Fix proposed to branch: stable/0.10
Review: https://review.openstack.org/375267

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-23: Fix merged to fuel-plugin-lma-collector (stable/0.10)

Reviewed: https://review.openstack.org/375267
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=a3fd22494a7592a158bc2210c33c0f40639053eb
Submitter: Jenkins
Branch: stable/0.10

commit a3fd22494a7592a158bc2210c33c0f40639053eb
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 13:53:04 2016 +0200

Fix issue when installing the OCF script

    Change-Id: I90b4fa2a9038eaed0f1dcadb0f00713a1b2487b0
    Closes-bug: #1575039
    (cherry picked from commit 8bc835c74a51b85da03b4719ea35540908418277)

Simon Pasquier (simon-pasquier) on 2017-02-21

Changed in lma-toolchain:
status:	Fix Committed → Won't Fix
status:	Won't Fix → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.