VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
openstack_version: "2014.2-6.1"
api: "1.0"
build_number: "310"
build_id: "2015-04-13_22-54-31"
nailgun_sha: "d22c074dec091e5ddd8ea3003c37665058303cd5"
python-fuelclient_sha: "9208ff4a08dcb674ce2df132399a5aa3ddfac21c"
astute_sha: "d96a80b63198a578b2c159edbd76048819039eb0"
fuellib_sha: "8b80657e9ceed8d59c2dff1c11e1481c7e69380e"
ostf_sha: "c2a76a60ec4ebbd78e508216c2e12787bf25e423"
fuelmain_sha: "335d3ed09ed79bd37e1f7a90442c4831c8845582"
[root@nailgun ostf]#
Scenario:
OS: UBUNTU, nova network
Destroy two controllers and check pacemaker status is correct
Pre-Condition: Ha env with 3 controllers and 2 computes was deployed, snapshot was created and reverted
Scenario:
1. Destroy first controller
2. Check pacemaker status
3. Run OSTF
4. Revert environment
5. Destroy second controller
6. Check pacemaker status
7. Run OSTF
Actual result:
Test create/ update/deplete image over Glance v1 failed with 500 server error in logs:
In glance:
2015-04-14 04:07:47.848 32397 ERROR glance.image_cache [588c4f5b-decd-4f4a-a7f9-a659ee580014 e000a8d82b2446de91e8469509372579 34cb7c36ed4c4b93bab71228feca6628 - - -] Exception encountered while tee'ing image '8a5b4e28-289b-471f-a53c-b7ea4dbc2bfd' into cache: [Errno 13] Permission denied: '/var/lib/glance/image-cache/incomplete/8a5b4e28-289b-471f-a53c-b7ea4dbc2bfd'. Continuing with response.
in swift:
http://paste.openstack.org/show/203840/
I'm looking into this issue and I'm not able to replicate the 500 error, but I am able to replicate some other issues with 401 errors and token issues. When you said you reverted the environment, do you mean that your restored the HA cluster so that all 3 servers were up and functional? I'm trying to get an exact set of steps to recreate the 500 from glance. Is this the scenario you witnessed the 500 from?
1) 3 nodes up (node-1 primary)
2) node-1 is killed (virsh destroy node-1)
3) node-2 takes over, verified via crm status
4) OSTF tests run OK
5) turn node-1 back on, wait for it to join the cluster
6) node-2 (now the primary) is killed (virsh destroy node-2)
7) node-3 takes over, verified via crm status
8) OSTF tests report 500 for glance v1 actions
What I'm seeing is 401 errors in OSTF after the failover occurs until a specific timeout is occurring and either a new token is generated or a cache somewhere is expiring.