[provision] Provisioning timed out after stop operation on Centos

Bug #1346924 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Vladimir Sharshov

Bug Description

FAIL: Stop reset cluster in ha mode
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/usr/lib/python2.7/dist-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_3/fuelweb_test/helpers/decorators.py", line 49, in wrapper
    return func(*args, **kwagrs)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_3/fuelweb_test/tests/test_environment_action.py", line 252, in deploy_stop_reset_on_ha
    self.fuel_web.deploy_cluster_wait(cluster_id)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_3/fuelweb_test/models/fuel_web_client.py", line 357, in deploy_cluster_wait
    self.assert_task_success(task, interval=interval)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_3/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/5.0_fuelmain.system_test.centos.thread_3/fuelweb_test/models/fuel_web_client.py", line 214, in assert_task_success
    task['status'], 'ready', name=task["name"]
AssertionError: Task 'deploy' has incorrect status. error != ready
http://jenkins-product.srt.mirantis.net:8080/view/5.0_swarm/job/5.0_fuelmain.system_test.centos.thread_3/47/consoleFull

In astute: Error reason:
2014-07-22T05:03:04 debug: [400] Can't read file with logs: /var/log/remote/node-4.test.domain.local/install/anaconda.log
2014-07-22T05:03:04 debug: [400] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"1", "progress"=>100, "status"=>"provisioned"}, {"uid"=>"2", "progress"=>100, "status"=>"provisioned"}, {"uid"=>"3", "progress"=>100, "status"=>"provisioned"}, {"uid"=>"4", "progress"=>0, "status"=>"provisioning"}, {"uid"=>"5", "progress"=>100, "status"=>"provisioned"}]}
2014-07-22T05:03:07 err: [400] Timeout of provisioning is exceeded. Nodes not booted: ["4"]

Seems we fail with network activation for this node(node-4). Output from syslog
http://paste.openstack.org/show/87616/

I sad to say that I can not rever this environment it fails on revert with message:
http://paste.openstack.org/show/87618/

So can provide snapshot only (you can find it in attach)

Tags: system-tests
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
summary: - Provisioning failed after stop operation on Centos
+ Provisioning timed out after stop operation on Centos
Dmitry Ilyin (idv1985)
summary: - Provisioning timed out after stop operation on Centos
+ [provision] Provisioning timed out after stop operation on Centos
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :
Download full text (3.8 KiB)

I suppose that node 4 got ext-4 fs error when we erased it during processing new deployment operation. Cobbler successfully rebooted node-4, but if mbr alive, we got error (boot order for system test: hd, netboot). I can not say more without real env which we could not restore (

This bug remember me about useful patchset: https://review.openstack.org/#/c/108188/ (5.1, but for 5.0.x it can be big change).

Stop deploy:
2014-07-22T03:25:15 debug: [402] MCO final result: mco success nodes: [{"uid"=>"1"}, {"uid"=>"2"}, {"uid"=>"4"}], mco error nodes: [], mco inaccessible nodes: [], all mco nodes: [{"uid"=>"2"}, {"uid"=>"4"}, {"uid"=>"1"}]

Deploy after:
2014-07-22T03:32:37 warning: [400] : Removing of nodes ["1", "2", "3", "4", "5"] finished with errors. Nodes [{"uid"=>"4", "error"=>"Node not answered by RPC."}] are inaccessible.

Cobbler:

2014-07-22T03:32:39 debug: [400] Cobbler syncing
2014-07-22T03:32:40 debug: [400] Trying to reboot node: node-1
2014-07-22T03:32:40 debug: [400] Trying to reboot node: node-2
2014-07-22T03:32:40 debug: [400] Trying to reboot node: node-3
2014-07-22T03:32:40 debug: [400] Trying to reboot node: node-4
2014-07-22T03:32:40 debug: [400] Trying to reboot node: node-5
2014-07-22T03:32:40 debug: [400] Cobbler syncing
2014-07-22T03:32:41 debug: [400] Waiting for reboot to be complete: nodes: ["node-1", "node-2", "node-3", "node-4", "node-5"]
2014-07-22T03:32:41 debug: [400] Reboot task status: node: node-1 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:42 debug: [400] Reboot task status: node: node-2 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:42 debug: [400] Reboot task status: node: node-3 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:42 debug: [400] Reboot task status: node: node-4 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:42 debug: [400] Reboot task status: node: node-5 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:47 debug: [400] Reboot task status: node: node-1 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:47 debug: [400] Reboot task status: node: node-2 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:47 debug: [400] Reboot task status: node: node-3 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:47 debug: [400] Reboot task status: node: node-4 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:47 debug: [400] Reboot task status: node: node-5 status: [1405999960.464637, "Power management (reboot)", "running", []]
2014-07-22T03:32:52 debug: [400] Reboot task status: node: node-1 status: [1405999960.464637, "Power management (reboot)", "complete", []]
2014-07-22T03:32:52 debug: [400] Successfully rebooted: node-1
2014-07-22T03:32:52 debug: [400] Reboot task status: node: node-2 status: [1405999960.464637, "Power management (reboot)", "complete", []]
2014-07-22T03:32:52 debug: [400] Successfully rebooted: node-2
2014-07-22T03:32:52 debug:...

Read more...

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Vladimir Sharshov (vsharshov)
Changed in fuel:
status: New → Incomplete
Changed in fuel:
milestone: 5.0.1 → 5.0.2
Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.