Contrail 3.2.5: control-api and other openstack services down in one contro node

Bug #1728278 reported by Deepak Jeyaraman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Invalid
Critical
Deepak Jeyaraman
R4.0
Invalid
Critical
Deepak Jeyaraman
R4.1
Invalid
Critical
Deepak Jeyaraman
Trunk
Invalid
Critical
Deepak Jeyaraman

Bug Description

Installed 3.2.5 contrail on a 3 contrail config/control node + 2 compute node HA setup and noticed that the contrail-api is down on the first node. Also openstack services went down after a day.

Log from contrail-api:

10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, ] description = Collector, Discovery:Collector[Subscribe - Status Code 503] connection down >>, ] >>
10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: NoSession] __default__ [SYS_NOTICE]: VncApiNotice: Connecting to zookeeper on 127.0.0.1:2181
10/25/2017 11:38:33 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, << type = Zookeeper name = Zookeeper server_addrs = [ 127.0.0.1:2181, ] status = Initializing description = >>, ] description = Collector, Discovery:Collector[Subscribe - Status Code 503], Zookeeper:Zookeeper[] connection down >>, ] >>
10/25/2017 11:38:34 AM [contrail-api]: SANDESH: [DROP: NoSession] __default__ [SYS_ERR]: VncApiError: IFMAP Healthcheck failed: default-global-system-config not found in IFMAP DB
10/25/2017 11:38:48 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = ccra-17 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = Collector server_addrs = [ 127.0.0.1:5998, ] status = Down description = Subscribe - Status Code 503 >>, << type = IFMap name = IfMap server_addrs = [ 127.0.0.1:8443, ] status = Up description = >>, << type = Zookeeper name = Zookeeper server_addrs = [ 127.0.0.1:2181, ] status = Down description = >>, ] description = Collector, Discovery:Collector[Subscribe - Status Code 503], Zookeeper:Zookeeper[] connection down >>, ] >>

Zookeeper is UP though:

root@ccra-17:/var/log/nova# netstat -anp | grep :2181
tcp 0 0 97.0.0.17:45430 97.0.0.17:2181 ESTABLISHED 19110/python
tcp 0 0 97.0.0.17:36456 97.0.0.14:2181 ESTABLISHED 2252/python
tcp 0 0 97.0.0.17:39574 97.0.0.16:2181 ESTABLISHED 19112/python
tcp 0 0 97.0.0.17:39536 97.0.0.16:2181 ESTABLISHED 19109/python
tcp 0 0 97.0.0.17:33016 97.0.0.14:2181 ESTABLISHED 3677/python
tcp 0 0 97.0.0.17:42436 97.0.0.17:2181 ESTABLISHED 2242/python
tcp 0 0 97.0.0.17:42442 97.0.0.17:2181 ESTABLISHED 2245/python
tcp 0 0 97.0.0.17:57356 97.0.0.17:2181 ESTABLISHED 36096/python
tcp6 0 0 :::2181 :::* LISTEN 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:42442 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:33310 97.0.0.14:2181 ESTABLISHED 2254/java
tcp6 0 0 97.0.0.17:2181 97.0.0.16:58233 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:42436 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:57356 ESTABLISHED 1802/java
tcp6 0 0 97.0.0.17:2181 97.0.0.17:45430 ESTABLISHED 1802/java

=======

root@ccra-17:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr initializing (NTP state unsynchronized.)
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen failed
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:0] connection down)
contrail-analytics-nodemgr initializing (NTP state unsynchronized.)
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing
contrail-config-nodemgr timeout
contrail-device-manager backup
contrail-discovery active
contrail-schema backup
contrail-svc-monitor backup
ifmap failed

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr initializing (NTP state unsynchronized.)
kafka active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

=======

root@ccra-17:~# openstack-status
== Nova services ==
openstack-nova-api: dead
openstack-nova-compute: inactive (disabled on boot)
openstack-nova-network: inactive (disabled on boot)
openstack-nova-scheduler: dead
openstack-nova-volume: inactive (disabled on boot)
openstack-nova-conductor: dead
== Glance services ==
openstack-glance-api: dead
openstack-glance-registry: dead
== Keystone service ==
openstack-keystone: active
== Cinder services ==
openstack-cinder-api: dead
openstack-cinder-scheduler: dead
openstack-cinder-volume: inactive (disabled on boot)
== Heat services ==
heat-api: dead
heat-api-cfn: dead
heat-api-cloudwatch: inactive (disabled on boot)
heat-engine: dead
== Support services ==
mysql: inactive (disabled on boot)
rabbitmq-server: active
memcached: inactive (disabled on boot)
== Keystone users ==

=====

tried restarting the openstack service, having issues:

root@ccra-17:/var/log/nova# service nova-scheduler restart
FAILED: attempted to kill nova-scheduler with sig SIGKILL but it wasn't running
nova-scheduler: ERROR (already started)
root@ccra-17:/var/log/nova# service nova-scheduler start
nova-scheduler: ERROR (already started)
root@ccra-17:/var/log/nova# service nova-scheduler status
nova-scheduler BACKOFF Exited too quickly (process log may have details)

root@ccra-17:/var/log/nova# service nova-api status
nova-api STARTING
root@ccra-17:/var/log/nova# service nova-api status
nova-api BACKOFF Exited too quickly (process log may have details)
root@ccra-17:/var/log/nova# service nova-api status
nova-api BACKOFF Exited too quickly (process log may have details)

=====

Setup:

10.102.28.138, 10.102.28.116, 10.102.28.139 (all config nodes)
ccra-13, ccra-12 are compute nodes.

root@ccra-17:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 3.2.5.0-51 51
contrail-config 3.2.5.0-51 51
contrail-config-openstack 3.2.5.0-51 51
contrail-control 3.2.5.0-51 51
contrail-database-common 3.2.5.0-51 51
contrail-dns 3.2.5.0-51 51
contrail-docs 3.2.5.0-51 51
contrail-f5 3.2.5.0-51 51
contrail-fabric-utils 3.2.5.0-51 51

Tags: config
information type: Proprietary → Public
Changed in juniperopenstack:
importance: Undecided → Critical
assignee: nobody → Ignatious Johnson Christopher (ijohnson-x)
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

Hi All,

As I said earlier both openstack and contrail-api services failed due to the presence of lower version of
oslo.confg package in node(10.102.28.138).

Fixed by
1. pip uninstall oslo.config
2. service supervisor-config restart
3. service supervisor-openstack restart

You may need to find out how lower version of oslo.config was installed in first config node(10.102.28.138)

nova-api logs:
------------------

TypeError: __init__() got an unexpected keyword argument 'regex'
Traceback (most recent call last):
  File "/usr/bin/nova-api", line 6, in <module>
    from nova.cmd.api import main
  File "/usr/lib/python2.7/dist-packages/nova/cmd/api.py", line 29, in <module>
    from nova import config
  File "/usr/lib/python2.7/dist-packages/nova/config.py", line 24, in <module>
    from nova.db.sqlalchemy import api as sqlalchemy_api
  File "/usr/lib/python2.7/dist-packages/nova/db/__init__.py", line 20, in <module>
    from nova.db.api import * # noqa
  File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 34, in <module>
    from nova.cells import rpcapi as cells_rpcapi
  File "/usr/lib/python2.7/dist-packages/nova/cells/rpcapi.py", line 30, in <module>
    import nova.conf
  File "/usr/lib/python2.7/dist-packages/nova/conf/__init__.py", line 65, in <module>
    from nova.conf import serial_console
  File "/usr/lib/python2.7/dist-packages/nova/conf/serial_console.py", line 71, in <module>
    """)
  File "/usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py", line 966, in __init__
    **kwargs)
TypeError: __init__() got an unexpected keyword argument ‘regex'

Contrail-api logs
----------------------

TypeError: __init__() got an unexpected keyword argument 'ignore_case'
Traceback (most recent call last):
  File "/usr/bin/contrail-api", line 9, in <module>
    load_entry_point('vnc-cfg-api-server==0.1dev', 'console_scripts', 'contrail-api')()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    return ep.load()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 91, in <module>
    import vnc_auth_keystone
  File "/usr/lib/python2.7/dist-packages/vnc_cfg_api_server/vnc_auth_keystone.py", line 22, in <module>
    from keystonemiddleware import auth_token
  File "/usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token/__init__.py", line 307, in <module>
    help='(Optional) If defined, indicate whether token data'
  File "/usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py", line 966, in __init__
    **kwargs)
TypeError: __init__() got an unexpected keyword argument ‘ignore_case'

Thanks,
Ignatious

Changed in juniperopenstack:
assignee: Ignatious Johnson Christopher (ijohnson-x) → nobody
assignee: nobody → Deepak Jeyaraman (jdeepak)
Jeba Paulaiyan (jebap)
tags: added: analytics
tags: added: config
removed: analytics
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.