Comment 5 for bug 1694963

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I'm re-opening this bug as it appears that the previous fix is not sufficient to resolve the problem.

When nova-compute is related with ceph it requests resources such as keys and pools by using the ceph broker api. When requests are responded to, the unit that sent the request gets a response containing an echo of the request id it sent out along with its request. Now the problem we have is that these responses go to all nova-compute units so each time a new nova-compute unit relates with ceph and sends a request, its response is send to all units and they currently have no way to know whether they have already received their own response so they blindly all restart nova-compute (see code at [1]).

So e.g. a request looks like:

    ~$ juju run --unit nova-compute/0 'relation-get -r `relation-ids ceph` - nova-compute/0'
    broker_req: '{"api-version": 1, "request-id": "f1c63e45-4b5e-11e7-8a92-fa163e37c682",
    "ops": []}'
    private-address: 10.5.59.164

and a response looks like:

    ~$ juju run --unit nova-compute/0 'relation-get -r `relation-ids ceph` - ceph/2'
    auth: cephx
    broker-rsp-nova-compute-0: '{"request-id": "f1c63e45-4b5e-11e7-8a92-fa163e37c682",
      "exit-code": 0}'
    broker-rsp-nova-compute-1: '{"request-id": "457b8757-4b5f-11e7-a324-fa163e2db5b7",
      "exit-code": 0}'
    broker-rsp-nova-compute-2: '{"request-id": "31c08f4b-4b5f-11e7-9756-fa163e7e11e1",
      "exit-code": 0}'
    broker-rsp-nova-compute-3: '{"request-id": "cfc17751-4b63-11e7-976d-fa163e6cc737",
      "exit-code": 0}'
    broker-rsp-nova-compute-4: '{"request-id": "0dab0cb5-4b6b-11e7-8e4b-fa163e053776",
      "exit-code": 0}'
    broker-rsp-nova-compute-4: '{"request-id": "3ed7d4d9-4b7b-11e7-a2b9-fa163e70844e",
      "exit-code": 0}'
    broker_rsp: '{"request-id": "3ed7d4d9-4b7b-11e7-a2b9-fa163e70844e", "exit-code": 0}'
    ceph-public-address: 10.5.59.150
    key: AQAmvzdZ4ekkFhAAn1KrnC5qT9MA7HAl0Ymvnw==
    private-address: 10.5.59.150

In this case the request from the new unit nova-compute-4 has be responded to all compute units which results in all of them restarting the nova-compute service.

[1] https://github.com/openstack/charm-nova-compute/blob/master/hooks/nova_compute_hooks.py#L402