Cinder-volume may fail to start properly during deployment

Bug #1968621 reported by DUFOUR Olivier
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
New
Undecided
Unassigned

Bug Description

This issue seems to happen when deploying specifically Cinder-volume on separate units.
The topology of Cinder is the following :
- Cinder API and Scheduler on LXD units
- Cinder Volume on baremetal units (for multiple iSCSI backends accesses)

When deploying a bundle, one or multiple cinder-volume units may end in 'blocked' status with its message complaining that 'cinder-volume' process isn't running which is exactly the issue.

In term of version :
- MaaS 3.1
- Juju 2.9.28
- Cinder charm's stable from Charmhub : 530

So far I've seen this happening from time to time on :
- Focal Wallaby and Focal Xena with a Powerstore a iSCSI backend.
- Focal Ussuri with Purestorage as iSCSI backend.

The solution is simply to run on the unit 'sudo systemctl restart cinder-volume' and the deployment can finish properly.

Looking at the logs, Cinder-volume fails to find a proper working backend and terminates itself, which is a normal behavior since it happens while the deployment is ongoing and all the local/subordinates charms may not have finished to install themselves.

I can observe that systemd's unit is configured to try to restart cinder-volume service if it fails to start, but for some reason it seems to stop retrying at some point. (see attached journalctl log).

The most interesting part on both log files are happening between 17:01:00 and 17:07:23 (time I restarted manually the service through systemctl command)

Revision history for this message
DUFOUR Olivier (odufourc) wrote :
Revision history for this message
DUFOUR Olivier (odufourc) wrote :
Revision history for this message
Vern Hart (vern) wrote :

I've seen this with my current deployment.

Using the "enabled-services" option, we've got scheduler,api deployed in control containers and volume deployed on the bare metal nodes because we are utilizing purestorage iscsi backends.

The cinder-volume units seem to often fail to start the cinder-volume services. I am not sure why but it looks like it may be related to the backend not being ready yet (at first).

We left the deployment for several hours (5pm until about 9am) and the service never restarted on their own.

Regardless *why* it's failing at first, it eventually succeeds when I manually start the cinder-volume service. Could the charm be more proactive by restarting services that aren't running that should be?

As suggested, this work-around gets us past this:

  juju run -a cinder-volume sudo systemctl restart cinder-volume

Revision history for this message
Nobuto Murata (nobuto) wrote :
tags: added: good-first-bug
Revision history for this message
Andre Ruiz (andre-ruiz) wrote (last edit ):

I'm also seeing this, and together another issue that may be related:

cinder-volume/4 blocked executing 10 10.1.12.62 Services not running that should be: cinder-volume
  cinder-three-par/2 waiting idle 10.1.12.62 Charm configuration in progress

At the same time that cinder-volume complains about not running, the cinder 3par charm does not configure itself. After cinder-volumes are started by hand and I remove/re-add the relation between the 3par driver and cinder-volume, it then happily finishes configuring very quickly.

Unfortunately I can't test if 3par drivers would configure ok on the first try if cinder-volumes were not stopped.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.