meta-data service fails/unreliable in the production cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Juniper Openstack |
Incomplete
|
High
|
Unassigned |
Bug Description
Some of the CI jobs failures are due to meta-data service failure. It was found that keystone (which provides the meta-data service) is stuck at 100% cpu in the openstack node in production cluster
e.g.
ubuntu-build03 /users/anantha> sshpass -p c0ntrail123 ssh -q root@10.84.26.14 top -n1 -b | \grep keystone
13497 keystone 20 0 211m 55m 5820 R 96 0.0 10:55.14 keystone-all
ubuntu-build03 /users/anantha> sshpass -p c0ntrail123 ssh -q root@10.84.26.14 top -n1 -b | \grep keystone
13497 keystone 20 0 211m 55m 5820 R 97 0.0 10:56.72 keystone-all
ubuntu-build03 /users/anantha> sshpass -p c0ntrail123 ssh -q root@10.84.26.14 top -n1 -b | \grep keystone
13497 keystone 20 0 211m 55m 5820 R 100 0.0 10:57.87 keystone-all
ubuntu-build03 /users/anantha>
Cluster is running with 1.20-63
Affected jobs usually ends up with default host name ci-oc-slave which is incorrect. Ideally it should get the correct host-name as retrieved from the meta-data service. I have seen meta-data query failing even after a minute or so.
tags: | added: ci |
information type: | Proprietary → Public |
tags: | added: openstack |
tags: | added: keystone |
Changed in juniperopenstack: | |
status: | New → Incomplete |