Neutron services losing connections to rabbitmq-server
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Autopilot Log Analyser |
Fix Committed
|
High
|
Francis Ginther | ||
OpenStack Neutron API Charm |
Incomplete
|
Undecided
|
Unassigned | ||
OpenStack Neutron Gateway Charm |
Incomplete
|
Undecided
|
Unassigned | ||
OpenStack RabbitMQ Server Charm |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
This was found with an automated test of Landscape Openstack Autopilot in our CI, [1].
It is not obvious to me what the actual problem is, but there is a lot that appears broken between rabbitmq-server and the neutron services. Several neutron services appear to have issues connecting and staying connected to rabbitmq. The rabbitmq services also report lots of closed connections due to missing heartbeats. I can't tell if rabbitmq-server is dropping connections due to being over-loaded or if the network itself is having trouble or something else.
I've seen this twice now in our automated testing, and will make the logs available for both if needed. I'll attach the neutron* and rabbitmq-server logs directly to this bug for the CI run mentioned in [1].
Versions:
LDS: 17.01~bzr10932+
JUJU: 1:2.1.0-
MAAS: 2.1.3+bzr5573-
OPENSTACK_RELEASE: newton
OBJECT: ceph
BLOCK: iscsi
rabbitmq-server charm: cs:xenial/
neutron-api charm: cs:xenial/
neutron-gateway charm: cs:xenial/
Charm configuration:
rabbitmq-server
- min-cluster-size: 3
neutron-api
- neutron-
- flat-network-
- enable-l3ha: true
- enable-dvr: false
- l2-population: false
- region: (set to the region name)
neutron-gateway
- instance-mtu: 1454
- bridge-mappings: physnet1:br-data
- data-port: (set to list of connected nics)
The problem ultimately presents itself when landscape fails when trying to create the initial neutron networks and router despite trying multiple times. Eventually it gives up and fails the deployment:
[from landscape-
Feb 28 23:11:30 job-handler-1 ERR RetryingCall for '_create_
Feb 28 23:11:30 job-handler-1 ERR Failed to execute job: Missing alive/up 'neutron-
Examples of neutron and rabbitmq services having problems:
[from landscape-
Feb 28 23:00:50 clipper neutron-
Feb 28 23:00:51 clipper neutron-
Feb 28 23:00:58 clipper neutron-
Feb 28 23:00:59 clipper neutron-
Feb 28 23:01:09 clipper neutron-
Feb 28 23:01:10 clipper neutron-
Feb 28 23:01:10 clipper neutron-
[from <email address hidden>]
=ERROR REPORT==== 28-Feb-
closing AMQP connection <0.29983.1> (10.96.66.74:50418 -> 10.96.65.40:5672):
Missed heartbeats from client, timeout: 60s
=INFO REPORT==== 28-Feb-
accepting AMQP connection <0.30289.1> (10.96.65.44:46668 -> 10.96.65.40:5672)
=ERROR REPORT==== 28-Feb-
closing AMQP connection <0.30008.1> (10.96.66.74:50942 -> 10.96.65.40:5672):
Missed heartbeats from client, timeout: 60s
[1] - https:/
Changed in landscape: | |
assignee: | nobody → Francis Ginther (fginther) |
milestone: | none → 17.02 |
Changed in autopilot-log-analyser: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Francis Ginther (fginther) |
Changed in landscape: | |
status: | New → Triaged |
Changed in autopilot-log-analyser: | |
status: | In Progress → Fix Committed |
Changed in landscape: | |
milestone: | 17.02 → 17.03 |
Changed in landscape: | |
status: | Triaged → Incomplete |
no longer affects: | landscape |
Here are the trimmed down logs from https:/ /ci.lscape. net/job/ landscape- system- tests/5413/. The entire logs are available at: /private- fileshare. canonical. com/~fginther/ landscape/ lp-1669456/ lst-5413/ all-logs. tar.gz
https:/
Also logs from another run can be found under: /private- fileshare. canonical. com/~fginther/ landscape/ lp-1669456/ lst-5149/ all-logs. tar.gz
https:/