Mirantis OpenStack

[10.0 swarm] task "rabbitmq" fails

Bug #1668311 reported by Sergey Novikov on 2017-02-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Confirmed	High	MOS Oslo	Mirantis OpenStack 9.x-updates

Bug Description

Detailed bug description:

The issue was found by https://product-ci.infra.mirantis.net/job/10.0.system_test.ubuntu.custom_hostname/195/testReport/(root)/default_hostname/default_hostname/

Steps to reproduce:
            1. Create a cluster
            2. Add 3 nodes with controller role
            3. Add 1 node with compute role
            4. Deploy the cluster

2017-02-27 01:32:22 INFO [17257] Cluster[]: All nodes are finished. Failed tasks: Task[rabbitmq/5] Stopping the deployment process!

Additional info: rabbitmq cluster seems alive http://paste.openstack.org/show/600641/

And this bug doesn't look similar to https://bugs.launchpad.net/mos/+bug/1626933 - rabbitmq-server has version 3.6.6-1~u16.04+mos1

Tags:

Revision history for this message

Sergey Novikov (snovikov) wrote on 2017-02-27:

fail_error_default_hostname-fuel-snapshot-2017-02-27_01-32-25.tar Edit (35.2 MiB, application/x-tar)

Vitaly Sedelnik (vsedelnik) on 2017-03-01

Changed in mos:
status:	New → Confirmed
assignee:	nobody → MOS Oslo (mos-oslo)
milestone:	none → 10.0
tags:	added: area-oslo

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2017-03-01:

puppet manifest was adding rabbit users right at the moment when pacemaker decided to restart some rabbits. Funny thing is that adding users in puppet is a useless operation in presence of pacemaker: those created users will be lost during resets/joins performed by OCF script.

We need to disable this user-creation activity completely - and the ONLY thing that puppet should do is to install package and drop 2 config files into their proper locations (i.e. no user management, no (re)starting/stopping/enabling of systemd unit, etc.)

Nastya Urlapova (aurlapova) on 2017-03-07

tags:

added: swarm-blocker
removed: swarm-fail

Nastya Urlapova (aurlapova) on 2017-03-16

tags:	added: swarm-fail removed: swarm-blocker
Changed in mos:
milestone:	10.0 → 9.x-updates

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2017-03-22:

Should be fixed by https://review.openstack.org/#/c/447420/

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2017-03-22:

I've deployed an env manually with this patch and there were no traces of spurious rabbitmq restarts

Revision history for this message

Alexey Lebedeff (alebedev-a) wrote on 2017-03-22:

To check whether patching had an effect you need to run the following command on any controller:

crm resource param p_rabbitmq-server show host_ip

And it should return "127.0.0.1" here.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fail_error_default_hostname-fuel-snapshot-2017-02-27_01-32-25.tar Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.