Contrail 3.2.5: control-api and other openstack services down in one contro node
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R3.2 |
Invalid
|
Critical
|
Deepak Jeyaraman | |||
R4.0 |
Invalid
|
Critical
|
Deepak Jeyaraman | |||
R4.1 |
Invalid
|
Critical
|
Deepak Jeyaraman | |||
Trunk |
Invalid
|
Critical
|
Deepak Jeyaraman |
Bug Description
Installed 3.2.5 contrail on a 3 contrail config/control node + 2 compute node HA setup and noticed that the contrail-api is down on the first node. Also openstack services went down after a day.
Log from contrail-api:
10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, ] description = Collector, Discovery:
10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: NoSession] __default__ [SYS_NOTICE]: VncApiNotice: Connecting to zookeeper on 127.0.0.1:2181
10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, << type = Zookeeper name = Zookeeper server_addrs = [ 127.0.0.1:2181, ] status = Initializing description = >>, ] description = Collector, Discovery:
10/25/2017 11:38:34 AM [contrail-api]: SANDESH: [DROP: NoSession] __default__ [SYS_ERR]: VncApiError: IFMAP Healthcheck failed: default-
10/25/2017 11:38:48 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, << type = Zookeeper name = Zookeeper server_addrs = [ 127.0.0.1:2181, ] status = Down description = >>, ] description = Collector, Discovery:
Zookeeper is UP though:
root@ccra-
tcp 0 0 97.0.0.17:45430 97.0.0.17:2181 ESTABLISHED 19110/python
tcp 0 0 97.0.0.17:36456 97.0.0.14:2181 ESTABLISHED 2252/python
tcp 0 0 97.0.0.17:39574 97.0.0.16:2181 ESTABLISHED 19112/python
tcp 0 0 97.0.0.17:39536 97.0.0.16:2181 ESTABLISHED 19109/python
tcp 0 0 97.0.0.17:33016 97.0.0.14:2181 ESTABLISHED 3677/python
tcp 0 0 97.0.0.17:42436 97.0.0.17:2181 ESTABLISHED 2242/python
tcp 0 0 97.0.0.17:42442 97.0.0.17:2181 ESTABLISHED 2245/python
tcp 0 0 97.0.0.17:57356 97.0.0.17:2181 ESTABLISHED 36096/python
tcp6 0 0 :::2181 :::* LISTEN 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:42442 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:33310 97.0.0.14:2181 ESTABLISHED 2254/java
tcp6 0 0 97.0.0.17:2181 97.0.0.16:58233 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:42436 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:57356 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:45430 ESTABLISHED 1802/java
=======
root@ccra-17:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-
contrail-alarm-gen failed
contrail-
contrail-
contrail-collector active
contrail-
contrail-
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing
contrail-
contrail-
contrail-discovery active
contrail-schema backup
contrail-
ifmap failed
== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-
== Contrail Database ==
contrail-database: active
== Contrail Supervisor Database ==
supervisor-
contrail-
kafka active
== Contrail Support Services ==
supervisor-
rabbitmq-server active
=======
root@ccra-17:~# openstack-status
== Nova services ==
openstack-nova-api: dead
openstack-
openstack-
openstack-
openstack-
openstack-
== Glance services ==
openstack-
openstack-
== Keystone service ==
openstack-keystone: active
== Cinder services ==
openstack-
openstack-
openstack-
== Heat services ==
heat-api: dead
heat-api-cfn: dead
heat-api-
heat-engine: dead
== Support services ==
mysql: inactive (disabled on boot)
rabbitmq-server: active
memcached: inactive (disabled on boot)
== Keystone users ==
=====
tried restarting the openstack service, having issues:
root@ccra-
FAILED: attempted to kill nova-scheduler with sig SIGKILL but it wasn't running
nova-scheduler: ERROR (already started)
root@ccra-
nova-scheduler: ERROR (already started)
root@ccra-
nova-scheduler BACKOFF Exited too quickly (process log may have details)
root@ccra-
nova-api STARTING
root@ccra-
nova-api BACKOFF Exited too quickly (process log may have details)
root@ccra-
nova-api BACKOFF Exited too quickly (process log may have details)
=====
Setup:
10.102.28.138, 10.102.28.116, 10.102.28.139 (all config nodes)
ccra-13, ccra-12 are compute nodes.
root@ccra-17:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------
contrail-analytics 3.2.5.0-51 51
contrail-config 3.2.5.0-51 51
contrail-
contrail-control 3.2.5.0-51 51
contrail-
contrail-dns 3.2.5.0-51 51
contrail-docs 3.2.5.0-51 51
contrail-f5 3.2.5.0-51 51
contrail-
information type: | Proprietary → Public |
Changed in juniperopenstack: | |
importance: | Undecided → Critical |
assignee: | nobody → Ignatious Johnson Christopher (ijohnson-x) |
tags: | added: analytics |
tags: |
added: config removed: analytics |
Hi All,
As I said earlier both openstack and contrail-api services failed due to the presence of lower version of 102.28. 138).
oslo.confg package in node(10.
Fixed by openstack restart
1. pip uninstall oslo.config
2. service supervisor-config restart
3. service supervisor-
You may need to find out how lower version of oslo.config was installed in first config node(10.102.28.138)
nova-api logs:
------------------
TypeError: __init__() got an unexpected keyword argument 'regex' nova-api" , line 6, in <module> python2. 7/dist- packages/ nova/cmd/ api.py" , line 29, in <module> python2. 7/dist- packages/ nova/config. py", line 24, in <module> python2. 7/dist- packages/ nova/db/ __init_ _.py", line 20, in <module> python2. 7/dist- packages/ nova/db/ api.py" , line 34, in <module> python2. 7/dist- packages/ nova/cells/ rpcapi. py", line 30, in <module> python2. 7/dist- packages/ nova/conf/ __init_ _.py", line 65, in <module> python2. 7/dist- packages/ nova/conf/ serial_ console. py", line 71, in <module> lib/python2. 7/dist- packages/ oslo_config/ cfg.py" , line 966, in __init__
Traceback (most recent call last):
File "/usr/bin/
from nova.cmd.api import main
File "/usr/lib/
from nova import config
File "/usr/lib/
from nova.db.sqlalchemy import api as sqlalchemy_api
File "/usr/lib/
from nova.db.api import * # noqa
File "/usr/lib/
from nova.cells import rpcapi as cells_rpcapi
File "/usr/lib/
import nova.conf
File "/usr/lib/
from nova.conf import serial_console
File "/usr/lib/
""")
File "/usr/local/
**kwargs)
TypeError: __init__() got an unexpected keyword argument ‘regex'
Contrail-api logs ------- ------- -
-------
TypeError: __init__() got an unexpected keyword argument 'ignore_case' contrail- api", line 9, in <module> entry_point( 'vnc-cfg- api-server= =0.1dev' , 'console_scripts', 'contrail-api')() python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 542, in load_entry_point n(dist) .load_entry_ point(group, name) python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 2569, in load_entry_point python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 2229, in load python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 2235, in resolve _(self. module_ name, fromlist= ['__name_ _'], level=0) python2. 7/dist- packages/ vnc_cfg_ api_server/ vnc_cfg_ api_server. py", line 91, in <module> python2. 7/dist- packages/ vnc_cfg_ api_server/ vnc_auth_ keystone. py", line 22, in <module> python2. 7/dist- packages/ keystonemiddlew are/auth_ token/_ _init__ .py", line 307, in <module> '(Optional) If defined, indicate whether token data' lib/python2. 7/dist- packages/ oslo_config/ cfg.py" , line 966, in __init__
Traceback (most recent call last):
File "/usr/bin/
load_
File "/usr/lib/
return get_distributio
File "/usr/lib/
return ep.load()
File "/usr/lib/
return self.resolve()
File "/usr/lib/
module = __import_
File "/usr/lib/
import vnc_auth_keystone
File "/usr/lib/
from keystonemiddleware import auth_token
File "/usr/lib/
help=
File "/usr/local/
**kwargs)
TypeError: __init__() got an unexpected keyword argument ‘ignore_case'
Thanks,
Ignatious