Comment 0 for bug 1423116

Revision history for this message
Leontiy Istomin (listomin) wrote : primary controller has been marked as offline by fuel.

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
auth_required: true
build_id: 2015-02-07_20-50-01
build_number: '76'
feature_groups:
- mirantis
fuellib_sha: 64f3ebe9fcbd18bf6c80a948e06061783a090347
fuelmain_sha: c799e3a6d88289e58db764a6be7910aab7da3149
nailgun_sha: 2ef819732a3ee7acf7b610e7d1c1a6da0434c1a0
ostf_sha: 3b57985d4d2155510894a1f6d03b478b201f7780
production: docker
release: 6.0.1
release_versions:
  2014.2-6.0.1:
    VERSION:
      api: '1.0'
      astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
      build_id: 2015-02-07_20-50-01
      build_number: '76'
      feature_groups:
      - mirantis
      fuellib_sha: 64f3ebe9fcbd18bf6c80a948e06061783a090347
      fuelmain_sha: c799e3a6d88289e58db764a6be7910aab7da3149
      nailgun_sha: 2ef819732a3ee7acf7b610e7d1c1a6da0434c1a0
      ostf_sha: 3b57985d4d2155510894a1f6d03b478b201f7780
      production: docker
      release: 6.0.1

Baremetal,Ubuntu, HA, Neutron-gre,Ceilometer,Ceph-all, Debug, 6.0.1_76
Controllers:3 Computes:96

Deployment has been passed successfully, but during full rally test primary controller node has been marked as offline. Also this node is unreachable via ssh.

[root@fuel ~]# ssh node-19
Warning: Permanently added 'node-19' (RSA) to the list of known hosts.
Write failed: Broken pipe

But at the moment I have one opened ssh session which gives the able to execute some commands.

here is output of top command:
http://paste.openstack.org/show/176687/

root@node-19:~# free -m
             total used free shared buffers cached
Mem: 32142 31768 373 0 211 11263
-/+ buffers/cache: 20292 11849
Swap: 15624 12 15612

"rabbitmqctl cluster_status" and "rabbitmqctl list_queues" commands just hang on this node

from other controller node:
root@node-52:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-52' ...
[{nodes,[{disc,['rabbit@node-19','rabbit@node-52','rabbit@node-65']}]},
 {running_nodes,['rabbit@node-19','rabbit@node-65','rabbit@node-52']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
...done.

root@node-52:~# rabbitmqctl list_queues | grep -v 0$
Listing queues ...
dhcp_agent.node-19 96
notifications.error 415
reply_0c7bc35f0e114b119b959160645ca04a 1
...done.

root@node-19:~# dmesg | grep -i error
[ 9.798790] ACPI Error: [\_SB_.PRAD]
[ 10.883460] ACPI Error: Method parse/execution failed [\_GPE._L24] (Node ffff880853d9d3e8), AE_NOT_FOUND (20131115/psparse-536)
[ 16.284591] ioapic: probe of 0000:00:05.4 failed with error -22
[ 17.779631] ERST: Error Record Serialization Table (ERST) support is initialized.
[ 31.029678] EXT4-fs (sda3): re-mounted. Opts: errors=remount-ro

crm status output is here
http://paste.openstack.org/show/176723/

The last line in rabbitmq log is:
=INFO REPORT==== 18-Feb-2015::10:11:18 ===
accepting AMQP connection <0.9669.490> (192.168.0.54:41674 -> 192.168.0.21:5673)

snapshot will be here asap