Vcenter-as-compute: contrail-vrouter-nodemgr stay "initializing" with collector-down

Bug #1737308 reported by Sarath
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Fix Committed
Critical
kamlesh parmar
Trunk
Fix Committed
Critical
kamlesh parmar

Bug Description

Topology : vcenter-compute 3 node HA
Version : 5.0 #54-CB

I tried "service restart" but still it doesn't recover with below seen on the log file,
There is collector-crash file and attached the decodes below,

12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: Cannot write http_port 8102 to /tmp/contrail-vrouter-nodemgr.1.http_port
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: Starting Introspect on HTTP Port 8102
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 process_status = [ << module_id = contrail-vrouter-nodemgr instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, ] description = Collector connection down >>, ] >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = contrailvm-5a10s25:Compute:contrail-vrouter-nodemgr:0 client_info = << status = Idle successful_connections = 0 pid = 13780 http_port = 8102 start_time = 1512809260426107 collector_name = collector_ip = collector_list = [ 10.87.36.11:8086, 10.87.36.12:8086, 10.87.36.10:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 1 >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 build_info = {"build-info" : [{"build-version" : "5.0.0", "build-time" : "2017-12-06 16:01:39.262635", "build-user" : "contrail-builder", "build-hostname" : "CB-mainline-u16-ocata-10-84-56-75", "build-id" : "5.0.0-54", "build-number" : "54"}]} system_cpu_info = << num_socket = 2 num_cpu = 2 num_core_per_socket = 1 num_thread_per_core = 1 >> running_package_version = 5.0.0-54 installed_package_version = 5.0.0-54 >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 process_status = [ << module_id = contrail-vrouter-nodemgr instance_id = 0 state = Functional description = >>, ] >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 deleted = False process_info = [ << process_name = contrail-vrouter-agent process_state = PROCESS_STATE_RUNNING start_count = 1 stop_count = 0 exit_count = 0 last_start_time = 1512692080742130 last_stop_time = last_exit_time = core_file_list = [ ] >>, << process_name = contrail-vrouter-nodemgr process_state = PROCESS_STATE_RUNNING start_count = 1 stop_count = 0 exit_count = 0 last_start_time = 1512809260050193 last_stop_time = last_exit_time = core_file_list = [ ] >>, ] build_info = {"build-info" : [{"build-version" : "5.0.0", "build-time" : "2017-12-06 16:01:39.262635", "build-user" : "contrail-builder", "build-hostname" : "CB-mainline-u16-ocata-10-84-56-75", "build-id" : "5.0.0-54", "build-number" : "54"}]} >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 deleted = False process_info = [ << process_name = contrail-vrouter-agent process_state = PROCESS_STATE_RUNNING start_count = 1 stop_count = 0 exit_count = 0 last_start_time = 1512692080742130 last_stop_time = last_exit_time = core_file_list = [ ] >>, << process_name = contrail-vrouter-nodemgr process_state = PROCESS_STATE_RUNNING start_count = 1 stop_count = 0 exit_count = 0 last_start_time = 1512809260050193 last_stop_time = last_exit_time = core_file_list = [ ] >>, ] build_info = {"build-info" : [{"build-version" : "5.0.0", "build-time" : "2017-12-06 16:01:39.262635", "build-user" : "contrail-builder", "build-hostname" : "CB-mainline-u16-ocata-10-84-56-75", "build-id" : "5.0.0-54", "build-number" : "54"}]} >>
12/09/2017 12:47:40 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 disk_usage_info = { /dev/mapper/ContrailVM--vg-root : << partition_type = ext4 partition_space_used_1k = 2213100 partition_space_available_1k = 52230176 percentage_partition_space_used = 4 >>/dev/sda1 : << partition_type = ext2 partition_space_used_1k = 140878 partition_space_available_1k = 87653 percentage_partition_space_used = 62 >> } all_core_file_list = [ ] process_mem_cpu_usage = { contrail-vrouter-agent : << mem_virt = 1024988 cpu_share = 0 mem_res = 300444 >>contrail-vrouter-nodemgr : << mem_virt = 511860 cpu_share = 0 mem_res = 68140 >> } system_mem_usage = << total = 8079092 used = 766280 free = 7312812 buffers = 119260 cached = 177232 node_type = vrouter >> system_cpu_usage = << one_min_avg = 0.44 five_min_avg = 0.25 fifteen_min_avg = 0.18 cpu_share = 0 node_type = vrouter >> >>
12/09/2017 12:47:44 AM [contrail-vrouter-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = contrailvm-5a10s25 process_status = [ << module_id = contrail-vrouter-nodemgr instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ 10.87.36.11:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Collector connection down >>, ] >>

Below is the service-state
######

root@5a10s31:~# docker ps
docker exec -it analytics bash
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a4c15356058c contrail-test-newton:4.1.0.0-8 "/bin/bash" 17 hours ago Up 17 hours contrail_test_A
ec6761e94d40 10.87.36.15:5100/ubuntu16newton54-contrail-analytics:54 "/lib/systemd/syst..." 20 hours ago Up 20 hours analytics
f1e4e1ca4e87 10.87.36.15:5100/ubuntu16newton54-contrail-analyticsdb:54 "/lib/systemd/syst..." 20 hours ago Up 20 hours analyticsdb
66090337912e 10.87.36.15:5100/ubuntu16newton54-contrail-controller:54 "/lib/systemd/syst..." 20 hours ago Up 20 hours controller
f952235bff5a 10.87.36.15:5100/ubuntu16newton54-horizon:54 "kolla_start" 21 hours ago Up 21 hours horizon
c2dea123159b 10.87.36.15:5100/ubuntu16newton54-heat-engine:54 "kolla_start" 21 hours ago Up 21 hours heat_engine
6b4a6fa9ed78 10.87.36.15:5100/ubuntu16newton54-heat-api-cfn:54 "kolla_start" 21 hours ago Up 21 hours heat_api_cfn
9d66bcfe097e 10.87.36.15:5100/ubuntu16newton54-heat-api:54 "kolla_start" 21 hours ago Up 21 hours heat_api
c0dcf9fde3b5 10.87.36.15:5100/ubuntu16newton54-neutron-metadata-agent:54 "kolla_start" 21 hours ago Up 21 hours neutron_metadata_agent
d7c019db4762 10.87.36.15:5100/ubuntu16newton54-neutron-dhcp-agent:54 "kolla_start" 21 hours ago Up 21 hours neutron_dhcp_agent
17de7937167e 10.87.36.15:5100/ubuntu16newton54-neutron-server:54 "kolla_start" 21 hours ago Up 21 hours neutron_server
b7f7131863de 10.87.36.15:5100/ubuntu16newton54-nova-novncproxy:54 "kolla_start" 21 hours ago Up 21 hours nova_novncproxy
9bad12f52b28 10.87.36.15:5100/ubuntu16newton54-nova-consoleauth:54 "kolla_start" 21 hours ago Up 21 hours nova_consoleauth
9eb4f84058f8 10.87.36.15:5100/ubuntu16newton54-nova-conductor:54 "kolla_start" 21 hours ago Up 21 hours nova_conductor
c5a6d659fe76 10.87.36.15:5100/ubuntu16newton54-nova-scheduler:54 "kolla_start" 21 hours ago Up 21 hours nova_scheduler
348aa96df153 10.87.36.15:5100/ubuntu16newton54-nova-api:54 "kolla_start" 21 hours ago Up 21 hours nova_api
182e4925c93a 10.87.36.15:5100/ubuntu16newton54-nova-placement-api:54 "kolla_start" 21 hours ago Up 21 hours placement_api
6bc156cad344 10.87.36.15:5100/ubuntu16newton54-nova-ssh:54 "kolla_start" 21 hours ago Up 21 hours nova_ssh
36563b494393 10.87.36.15:5100/ubuntu16newton54-glance-registry:54 "kolla_start" 21 hours ago Up 21 hours glance_registry
6392b45a8f81 10.87.36.15:5100/ubuntu16newton54-glance-api:54 "kolla_start" 21 hours ago Up 21 hours glance_api
7f56822dad30 10.87.36.15:5100/ubuntu16newton54-keystone:54 "kolla_start" 21 hours ago Up 21 hours keystone
00f63a394471 10.87.36.15:5100/ubuntu16newton54-rabbitmq:54 "kolla_start" 21 hours ago Up 21 hours rabbitmq
9c391d67b905 10.87.36.15:5100/ubuntu16newton54-mariadb:54 "kolla_start" 21 hours ago Up 21 hours mariadb
5147b6451094 10.87.36.15:5100/ubuntu16newton54-memcached:54 "kolla_start" 21 hours ago Up 21 hours memcached
c7fe0eb3461f 10.87.36.15:5100/ubuntu16newton54-keepalived:54 "kolla_start" 21 hours ago Up 21 hours keepalived
1ed715183c03 10.87.36.15:5100/ubuntu16newton54-haproxy:54 "kolla_start" 21 hours ago Up 21 hours haproxy
102dd8783448 10.87.36.15:5100/ubuntu16newton54-cron:54 "kolla_start" 21 hours ago Up 21 hours cron
c335ce722e0d 10.87.36.15:5100/ubuntu16newton54-kolla-toolbox:54 "kolla_start" 21 hours ago Up 21 hours kolla_toolbox
455a76759764 kolla/ubuntu-binary-fluentd:4.0.0 "kolla_start" 21 hours ago Restarting (0) 6 hours ago fluentd
root@5a10s31:~# docker exec -it analytics bash
contrail-status
contrail-status
root@5a10s31(analytics):/# contrail-status
exit
== Contrail Analytics ==
contrail-collector: active
contrail-analytics-api: active
contrail-query-engine: active
contrail-alarm-gen: active
contrail-snmp-collector: active
contrail-topology: active
contrail-analytics-nodemgr: active
========Run time service failures=============
/var/crashes/core.contrail-collec.3340.5a10s31.1512691169
root@5a10s31(analytics):/# exit
exit
root@5a10s31:~# ^H
-bash: :s^H: substitution failed
root@5a10s31:~# docker exec -it analyticsdb bash
contrail-status
contrail-status
root@5a10s31(analyticsdb):/# contrail-status
exit
== Contrail Database ==
contrail-database: active

kafka: active
contrail-database-nodemgr: active
========Run time service failures=============
/var/crashes/core.contrail-collec.3340.5a10s31.1512691169
root@5a10s31(analyticsdb):/# exit
exit
root@5a10s31:~# docker exec -it controller bash
contrail-status
contrail-status
root@5a10s31(controller):/# contrail-status
exit
== Contrail Control ==
contrail-control: active
contrail-named: active
contrail-dns: active
contrail-control-nodemgr: active
== Contrail Config ==
contrail-api: active
contrail-schema: backup
contrail-svc-monitor: backup
contrail-device-manager: backup
contrail-config-nodemgr: active
== Contrail Config Database==
contrail-database: active

== Contrail Web UI ==
contrail-webui: active
contrail-webui-middleware: active
== Contrail Support Services ==
zookeeper: active
rabbitmq-server: inactive (disabled on boot)
========Run time service failures=============
/var/crashes/core.contrail-collec.3340.5a10s31.1512691169
root@5a10s31(controller):/# exit
exit
root@5a10s31:~# ssh root@10.87.36.19
root@10.87.36.19's password:
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

  System information as of Fri Dec 8 12:48:10 PST 2017

  System load: 0.52 Processes: 120
  Usage of /: 4.4% of 54.72GB Users logged in: 1
  Memory usage: 6% IP address for vhost0: 10.87.36.19
  Swap usage: 0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

128 packages can be updated.
56 updates are security updates.

Last login: Fri Dec 8 12:39:27 2017 from 10.87.36.10
root@contrailvm-5a10s27:~# contrail-status
== Contrail vRouter ==
contrail-vrouter-agent: active
contrail-vrouter-nodemgr: initializing (NTP state unsynchronized.)
root@contrailvm-5a10s27:~#
root@contrailvm-5a10s27:~#

root@5a10s31:/var/crashes# gdb vizd core.contrail-collec.3340.5a10s31.1512691169
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vizd...done.

warning: core file may not match specified executable file.
[New LWP 3359]
[New LWP 3364]
[New LWP 3349]
[New LWP 3845]
[New LWP 3366]
[New LWP 3354]
[New LWP 3347]
[New LWP 3348]
[New LWP 3350]
[New LWP 3351]
[New LWP 3352]
[New LWP 3353]
[New LWP 3355]
[New LWP 3357]
[New LWP 3362]
[New LWP 3846]
[New LWP 3363]
[New LWP 3356]
[New LWP 3365]
[New LWP 3847]
[New LWP 3340]
[New LWP 3358]

warning: Could not load shared library symbols for 21 libraries, e.g. /usr/lib/x86_64-linux-gnu/libcassandra.so.2.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f63b4f4d428 in sigandset (dest=0xd0c, left=0xd1f, right=0x6) at sigandset.c:33
33 sigandset.c: No such file or directory.
[Current thread is 1 (Thread 0x7f63a5648700 (LWP 3359))]
(gdb) bt
#0 0x00007f63b4f4d428 in sigandset (dest=0xd0c, left=0xd1f, right=0x6) at sigandset.c:33
#1 0x0000000000000020 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb)

Revision history for this message
Sarath (nsarath) wrote :

nsarath@ubuntu-build02:/auto/cores/1737308$ ls -l
total 261172
-rwxrwxrwx 1 nsarath test 8069120 Dec 9 01:14 analytics-logs.tar
-rwxrwxrwx 1 nsarath test 231849984 Dec 9 01:11 core.contrail-collec.3340.5a10s31.1512691169
-rwxrwxrwx 1 nsarath test 26460160 Dec 9 01:12 vrouter-log.tar
nsarath@ubuntu-build02:/auto/cores/1737308$

Revision history for this message
Zhiqiang Cui (zcui) wrote :

Assert happen:

 cass::cql::CqlIfImpl::IsTableStatic(const string&): Assertion `impl::GetCassTableClusteringKeyCount(cci_, session_.get(), keyspace_, table, &ck_count)' failed.

Revision history for this message
Zhiqiang Cui (zcui) wrote :

From the log, lost Cassandra connection first, it lead contrail-collector Assert/generate core dump and restore after that.

Revision history for this message
Zhiqiang Cui (zcui) wrote :

A new bug for crash has been triggered, contrail-vrouter-nodemgr stay "initializing" root cause is NTP can't sync. So need server manager team to take this bug.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38479
Submitter: kamlesh parmar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/38480
Submitter: kamlesh parmar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38479
Committed: http://github.com/Juniper/contrail-ansible/commit/93b51fef1e80191530d9c721c40bf895b08d2e3b
Submitter: Zuul (<email address hidden>)
Branch: master

commit 93b51fef1e80191530d9c721c40bf895b08d2e3b
Author: Kamlesh Parmar <email address hidden>
Date: Tue Dec 19 16:04:11 2017 -0800

Wait for contrailVM to be reachable befoire attempting to run preconfig.

Change-Id: I6823f6aba2450cf4b960d26afcf6bb61087c3aa2
Closes-Bug: #1737308

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38480
Committed: http://github.com/Juniper/contrail-ansible/commit/41f51489ff1dd38a6ed88dd398d3400288784449
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 41f51489ff1dd38a6ed88dd398d3400288784449
Author: Kamlesh Parmar <email address hidden>
Date: Tue Dec 19 16:04:11 2017 -0800

Wait for contrailVM to be reachable befoire attempting to run preconfig.

Change-Id: I6823f6aba2450cf4b960d26afcf6bb61087c3aa2
Closes-Bug: #1737308

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.