Scale ironic cluster with ironic+contorller node broke instance creation
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Mirantis OpenStack | Status tracked in 10.0.x | |||||
10.0.x |
Fix Committed
|
High
|
Fuel Toolbox |
Bug Description
Detailed bug description:
After add new ironic+controller node to cluster and deploy changes i can't create baremetal instance (instance became to ERROR status with MessagingTimeout exception)
Steps to reproduce:
1. Deploy MOS with 1 controller, 1 compute, 2 ironic-
2. Add new node (with fuel-devops)
3. Wait this node to be discovered
4. Assign ironic and controller roles to this node
5. Run network check
6. Deploy changes
7. Run network check
8. Start baremetal instance
Expected results:
All steps should pass
Actual result:
After some time instance created in step 8 reach ERROR status
Reproducibility:
Always
Description of the environment:
- MOS 9.0 ISO build (at least from build #432)
- Network model: VLAN
Additional information:
This bug doesn't appear if in step 4 assign only ironic role to node.
Changed in mos: | |
assignee: | nobody → MOS Ironic (mos-ironic) |
tags: | added: area-ironic |
Changed in mos: | |
assignee: | Fuel Library (Deprecated) (fuel-library) → Fuel Toolbox (fuel-toolbox) |
tags: | added: on-verification |
Changed in mos: | |
status: | Fix Committed → Fix Released |
tags: | removed: on-verification |
I checked the logs and see the following:
1) request to boot an instance indeed fails with MessagingTimeout in nova-api on node-5
2016-06- 03T14:33: 04.297454+ 00:00 err: 2016-06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions [req-3ff12a0b- d829-4217- adb3-c870e88a3b 3 db2b2eb1a7dcb74 b4 8fa6f6fac6d149d fb84fc021c32619 b1 - - -] Unexpected exception in API method# 0122016- 06-03 14:33:04.281 9426 E openstack. extensions Traceback (most recent call last):# 0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions Fil python2. 7/dist- packages/ nova/api/ openstack/ extensions. py", line 478, in wrapped# 0122016- 06-03 14:33:04.281 9426 ERROR nova.api.open #0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions File "/usr/lib/ python2. 7/d nova/api/ validation/ __init_ _.py", line 73, in wrapper# 0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions retu #0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions File "/usr/lib/ python2. 7/dist- packages/ nova/api/ __init_ _.py", line 73, in wrapper# 0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions return func(*args, **kwarg openstack. extensions File "/usr/lib/ python2. 7/dist- packages/ nova/api/ validation/ __init_ _.py 0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions return func(*args, **kwargs) #0122016- 06-03 14:33 openstack. extensions File "/usr/lib/ python2. 7/dist- packages/ nova/api/ openstack/ compute/ servers. py", line 629, in 0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions **create_ kwargs) #0122016- 06-03 14:33:04.281 9426 ERROR nova.api. extensions File "/usr/lib/ python2. 7/dist- packages/ nova/hooks. py", line 154, in inner#0122016-06-03 14:33:04.281 9426 ERROR nova.api extensions rv = f(*args, **kwargs) #0122016- 06-03 14:33:04.281 9426 ERROR nova.api. openstack. extensions File "/usr/lib/python2. packages/ nova/compute/ api.p
4 71d4f3e3083643d
RROR nova.api.
e "/usr/lib/
stack.extensions return f(*args, **kwargs)
ist-packages/
rn func(*args, **kwargs)
validation/
s)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.
", line 73, in wrapper#
:04.281 9426 ERROR nova.api.
create#
openstack.
.openstack.
7/dist-
Because it tries to do an RPC call to nova-network, which we do not actually deploy.
2) looks like nova.conf was updated, but nova-api was not restarted after that:
$ grep use_neutron ./node- 5/etc/nova/ nova.conf
use_neutron=True
2016-06- 03T14:07: 47.358934+ 00:00 debug: 2016-06-03 14:07:47.358 6789 DEBUG oslo_service. service [-] use_neutron = False log_opt_values /usr/lib/ python2. 7/dist- packages/ oslo_config/ cfg.py: 2517
which makes nova-api think it should call nova-network, not neutron.
The workaround is to restart nova-api on all controller nodes. We'll need to check the Puppet manifests to make sure nova-api is properly restarted after all changes done to config files.