R1.10-build-25: Cluster down due to cassandra db service not responding
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R1.1 |
Invalid
|
Critical
|
Deepinder Setia | |||
Trunk |
Invalid
|
Critical
|
Deepinder Setia |
Bug Description
After leaving the setup overnight with traffic, noticed following
1. nova-conductor log was full of error and grown to 40+G.. Disk space in one node was 100% consumed by these logs..
2. Multiple processes were in Init state, due to Discovery Connection issue..
Even after moving the logs out, setup didnt recover..
Setup is in failure state for debugging.. Request to see why Connections to Discovery is failing.. you can check nodeg6, nodeg7,..
cfgm, nodeg6.
cfgm, control, nodeg7.
cfgm, control, nodeg8.
compute nodeg9.
compute nodeg22.
root@nodeg7:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control initializing
contrail-
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-
contrail-
contrail-
contrail-collector initializing
contrail-
== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing
contrail-
contrail-
contrail-schema initializing
contrail-
information type: | Proprietary → Public |
summary: |
- R1.10-build-25: Multiple processes reporting connection to discovery - down + R1.10-build-25: Cluster down due to error in Discovery process |
summary: |
- R1.10-build-25: Cluster down due to error in Discovery process + R1.10-build-25: Cluster down due to cassandra db service not responding |
tags: | removed: blocker |
Checked the setup with Prakash, the status of discovery process is misleading...it says active, which is not correct.. Also, following error has to be fixed [variable not defined]..
'REQUEST_METHOD': 'POST', englab. juniper. net', 10.204. 217.46: 5998/subscribe>, request. body': <StringIO.StringIO instance at 0x7f16710>, request. headers' : <bottle. WSGIHeaderDict object at 0x26b3f050>, request. json': {u'client': u'nodeg8: Contrail- Analytics- Nodemgr' ,
u'client- type': u'Contrail- Analytics- Nodemgr' ,
u'instances ': 2,
u'service' : u'Collector'}, request. urlparts' : SplitResult( scheme= 'http', netloc= '10.204. 217.46: 5998', path='/subscribe', query='', fragment=''), .error_ handler of <discovery. disc_server. DiscoveryServer instance at 0x277c3b0>>>, .error_ handler of <discovery. disc_server. DiscoveryServer instance at 0x277c3b0>>>, multiprocess' : False, multithread' : False,
'SCRIPT_NAME': '',
'SERVER_NAME': 'nodeg6.
'SERVER_PORT': '9110',
'SERVER_PROTOCOL': 'HTTP/1.0',
'SERVER_SOFTWARE': 'gevent/1.0 Python/2.7',
'bottle.app': <bottle.Bottle object at 0x26763d0>,
'bottle.request': <LocalRequest: POST http://
'bottle.
'bottle.
'bottle.
'bottle.
'bottle.route': <POST '/subscribe' <bound method DiscoveryServer
'route.handle': <POST '/subscribe' <bound method DiscoveryServer
'route.url_args': {},
'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f73d6623270>,
'wsgi.input': <StringIO.StringIO instance at 0x7f16710>,
'wsgi.
'wsgi.
'wsgi.run_once': False,
'wsgi.url_scheme': 'http',
'wsgi.version': (1, 0)} failed with error
10.204.217.46 - - [2014-09-01 03:29:25] "POST /subscribe HTTP/1.1" socket 0 327.711255 python2. 7/dist- packages/ bottle. py", line 764, in _handle python2. 7/dist- packages/ bottle. py", line 1575, in wrapper python2. 7/dist- packages/ discovery/ disc_server. py", line 332, in error_handler _debug[ '503'] += 1
Traceback (most recent call last):
File "/usr/lib/
return route.call(**args)
File "/usr/lib/
rv = callback(*a, **ka)
File "/usr/lib/
self.
NameError: global name 'self' is not defined
10.204.217.46 - - [2014-09-01 03:29:27] "POST /subscribe HTTP/1.1" 500 890 0.003713
Traceback (most recent call last):