Steps to reproduce:
1. Create cluster
2. Add 3 nodes with controller and ceph OSD roles
3. Add 1 node with ceph OSD roles
4. Add 2 nodes with compute and ceph OSD roles
5. Deploy the cluster
6. Cold reboot all nodes in the cluster
Expected result:
The cluster is working, all pcs resources are up after nodes startup
Actual result:
The luster is broken. All resources are down.
Presumably the main problem is outage of resources on VM nodes. When we have add memory to the controller nodes mysql enabled performance mode and got all free memory again.
Such behavior doesn't appear on the baremetal lab with 64GB memory on controller nodes.
This leads to the test [1] error with ntpdate, because all resources are down including ntp one:
2016-04-20 03:50:42,371 - ERROR decorators.py:126 -- Traceback (most recent call last):
File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_3/fuelweb_test/helpers/decorators.py", line 120, in wrapper
result = func(*args, **kwargs)
File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_3/fuelweb_test/tests/tests_strength/test_restart.py", line 165, in ceph_ha_restart
'slave-04']))
File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_3/fuelweb_test/models/fuel_web_client.py", line 1860, in cold_restart_nodes
self.environment.sync_time()
File "/home/jenkins/workspace/9.0.system_test.ubuntu.thread_3/fuelweb_test/models/environment.py", line 137, in sync_time
new_time = sync_time(self.d_env, nodes_names, skip_sync)
File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/retry.py", line 27, in wrapper
return func(*args, **kwargs)
File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/ntp.py", line 45, in sync_time
g_ntp.do_sync_time(g_ntp.other_ntps)
File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/ntp.py", line 134, in do_sync_time
.format(self.report_not_synchronized(ntps)))
TimeoutError: Time on nodes was not set with 'ntpdate':
[(u'slave-04', ['Wed Apr 20 03:46:22 UTC 2016\n'])]
This error couldn't be fixed in the test with smart waiting for ntp resource. When tried to reproduce the cluster was broken even after one day waiting.
[1] https://github.com/openstack/fuel-qa/blob/391b1219ebfa5f3d136054931076cbd03644c6eb/fuelweb_test/tests/tests_strength/test_restart.py#L90
Alexey, please attach snapshot if possible.