[Upgrade Xenial -> Bionic] Some of the OSDs are in blocked state after upgrade due to "Non-pristine devices detected"

Bug #1933914 reported by Celia Wang
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Confirmed
Undecided
Unassigned

Bug Description

I was trying to upgrade ceph-osd with:
juju set-series ceph-osd bionic
# Skipping dist-upgrade as already done prior
juju config ceph-osd source=distro

Then 6/15 OSDs go into "blocked" state complaining about "Non-pristine devices detected, consult `list-disks`, `zap-disk` and `blacklist-*` actions."

I've checked that OSDs are healthy. Then I tried to use "blacklist-add-disks" to add non-pristine disks and then trigger the config-changed hook manually. But it doesn't help.

Detailed info:
1. juju output:
https://pastebin.canonical.com/p/TwWN2Vyzjs/

2. ceph mon status:
https://pastebin.canonical.com/p/4S4MZTyhkp/

3. list-disks output of ceph-osd/0 for e.g:
https://pastebin.canonical.com/p/7NJgGSC9Mb/

Xav Paice (xavpaice)
tags: added: ceph-upgrade openstack-upgrade
Revision history for this message
Xav Paice (xavpaice) wrote :
Download full text (5.5 KiB)

FYI, content of 'osd-devices' is `/dev/disk/by-dname/bcache2 /dev/disk/by-dname/bcache3 /dev/disk/by-dname/bcache4 /dev/disk/by-dname/bcache5 /dev/disk/by-dname/bcache6 /dev/disk/by-dname/bcache7`.

Some logs that maybe interesting from the unit:
2021-06-29 02:19:28 DEBUG juju.worker.uniter.remotestate watcher.go:427 got application change
2021-06-29 02:19:28 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.remotestate watcher.go:448 got config change: ok=true, hashes=[26a4c44f83521054789f1277431b1abc01b2cb26aa783a3dbd32246ec9e558ea]
2021-06-29 02:20:00 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:59 running operation run config-changed hook
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 preparing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 executing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter agent.go:20 [AGENT-STATUS] executing: running config-changed hook
2021-06-29 02:20:00 DEBUG juju.worker.uniter.runner runner.go:595 starting jujuc server {unix @/var/lib/juju/agents/unit-ceph-osd-0/agent.socket <nil>}
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Hardening function 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "config-get"
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log No hardening applied to 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log old_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log new_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 ERROR juju-log Invalid upgrade path from luminous to luminous. Valid paths are: ['firefly -> hammer', 'hammer -> jewel', 'jewel -> luminous', 'luminous -> mimic', 'mimic -> nautilus', 'nautilus -> octopus']
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Updating sysctl_file: /etc/sysctl.d/50-ceph-osd-charm.conf values: {'kernel.pid_max': 2097152, 'vm.max_map_count': 524288, 'kernel.threads-max': 2097152, 'vm.vfs_cache_pressure': 100, 'vm.swappiness'
: 1}

2021-06-29 02:20:02 DEBUG juju-log got journal devs: {'/dev/disk/by-dname/nvme0n1-part3'}
2021-06-29 02:20:02 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:02 INFO juju-log Skipping osd devices previously processed by this uni...

Read more...

Revision history for this message
Celia Wang (ziyiwang) wrote :

ceph-osd unit logs after run "blacklist-add-disks":
https://pastebin.canonical.com/p/QkM4nHSwdt/

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi

I'm a little confused about the bug report. Please could you include the complete juju status (so se can see the machines), and the logs from an affected unit. Also, the commands (and their order) that were used to during the upgrade to do the upgrade.

Thanks.

Changed in charm-ceph-osd:
status: New → Incomplete
Revision history for this message
Drew Freiberger (afreiberger) wrote :

The charm's osd-devices configuration is the following (which uses udev rules to provide static bcache naming paths, as bcaches rename themselves upon every boot/loading of the bcache module):

/dev/disk/by-dname/bcache2 /dev/disk/by-dname/bcache3 /dev/disk/by-dname/bcache4 /dev/disk/by-dname/bcache5 /dev/disk/by-dname/bcache6 /dev/disk/by-dname/bcache7

I queried the unit state database and found that the osd-devices key had the following data:

["/dev/disk/by-dname/bcache7", "/dev/disk/by-dname/bcache2", "/dev/disk/by-dname/bcache3", "/dev/bcache4", "/dev/disk/by-dname/bcache4", "/dev/disk/by-dname/bcache5"]

On the host, osd-device /dev/disk/by-dname/bcache6 is a symlink to /dev/bcache4.

When config-changed was run on ceph-osd 21.04 charms, it was trying to configure /dev/disk/by-dname/bcache6 because it did not exist in the osd-devices in unitdata.kv (is my hypothesis). Because this disk WAS in use and configured properly (as /dev/bcache4), this non-pristine error was incorrect.

Manually updating the state database with:
sqlite> update kv set data='["/dev/disk/by-dname/bcache7", "/dev/disk/by-dname/bcache2", "/dev/disk/by-dname/bcache3", "/dev/disk/by-dname/bcache6", "/dev/disk/by-dname/bcache4", "/dev/disk/by-dname/bcache5"]' where key='osd-devices';

After that, running hooks/config-changed cleared the issue for this node.

So, I believe that it may be worth having the charm check if the given osd-device path is a symlink to an already configured osd-device.

Changed in charm-ceph-osd:
status: Incomplete → Confirmed
Revision history for this message
Drew Freiberger (afreiberger) wrote :

It might be useful for list-disks to provide a list of known/configured osd-devices from the kv store for troubleshooting issues like this in the future.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Here's an example:

root@ceph-osd-2:/var/lib/juju/agents/unit-ceph-osd-2/charm# ls -al /dev/disk/by-dname/bca*
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache0 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache1 -> ../../bcache7
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache2 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache3 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache4 -> ../../bcache4
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache5 -> ../../bcache2
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache6 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache7 -> ../../bcache1
root@ceph-osd-2:/var/lib/juju/agents/unit-ceph-osd-2/charm# sqlite3 .unit-state.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> select data from kv where key='osd-devices';
["/dev/disk/by-dname/bcache7", "/dev/bcache0", "/dev/bcache1", "/dev/bcache5", "/dev/bcache2", "/dev/bcache4"]

This one is even tougher. I'm guessing when this was deployed, we used /dev/bcacheX, and because those got renamed upon each boot, we created the udev rules for /dev/disk/by-dname paths and then triggered this issue. As you can see ehre, some of those /dev/bcacheX's don't map to the current booted host's /dev/disk/by-dname/bcache[2-7] which is what was defined in osd-devices, so I don't even think you could use that methodology to trace this.

I think ultimately, storing /dev/bcacheX paths in osd-devices is futile, as they are renamed on each boot, and some other stragegy will be needed to ensure the charm doesn't reconfigure a disk already known, and that it knows that the disks that are configured were configured by itself. Perhaps use of the bcache UUID of the device would help.

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
tags: added: cdo-qa
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.