Ceph OSD Charm

[Upgrade Xenial -> Bionic] Some of the OSDs are in blocked state after upgrade due to "Non-pristine devices detected"

Bug #1933914 reported by Celia Wang on 2021-06-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceph OSD Charm	Confirmed	Undecided	Unassigned

Bug Description

I was trying to upgrade ceph-osd with:
juju set-series ceph-osd bionic
# Skipping dist-upgrade as already done prior
juju config ceph-osd source=distro

Then 6/15 OSDs go into "blocked" state complaining about "Non-pristine devices detected, consult `list-disks`, `zap-disk` and `blacklist-*` actions."

I've checked that OSDs are healthy. Then I tried to use "blacklist-add-disks" to add non-pristine disks and then trigger the config-changed hook manually. But it doesn't help.

Detailed info:
1. juju output:
https://pastebin.canonical.com/p/TwWN2Vyzjs/

2. ceph mon status:
https://pastebin.canonical.com/p/4S4MZTyhkp/

3. list-disks output of ceph-osd/0 for e.g:
https://pastebin.canonical.com/p/7NJgGSC9Mb/

Tags:

Xav Paice (xavpaice) on 2021-06-29

tags:

added: ceph-upgrade openstack-upgrade

Revision history for this message

Xav Paice (xavpaice) wrote on 2021-06-29:

Download full text (5.5 KiB)

FYI, content of 'osd-devices' is `/dev/disk/by-dname/bcache2 /dev/disk/by-dname/bcache3 /dev/disk/by-dname/bcache4 /dev/disk/by-dname/bcache5 /dev/disk/by-dname/bcache6 /dev/disk/by-dname/bcache7`.

Some logs that maybe interesting from the unit:
2021-06-29 02:19:28 DEBUG juju.worker.uniter.remotestate watcher.go:427 got application change
2021-06-29 02:19:28 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.remotestate watcher.go:448 got config change: ok=true, hashes=[26a4c44f83521054789f1277431b1abc01b2cb26aa783a3dbd32246ec9e558ea]
2021-06-29 02:20:00 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:59 running operation run config-changed hook
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 preparing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 executing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter agent.go:20 [AGENT-STATUS] executing: running config-changed hook
2021-06-29 02:20:00 DEBUG juju.worker.uniter.runner runner.go:595 starting jujuc server {unix @/var/lib/juju/agents/unit-ceph-osd-0/agent.socket <nil>}
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Hardening function 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "config-get"
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log No hardening applied to 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log old_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log new_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 ERROR juju-log Invalid upgrade path from luminous to luminous. Valid paths are: ['firefly -> hammer', 'hammer -> jewel', 'jewel -> luminous', 'luminous -> mimic', 'mimic -> nautilus', 'nautilus -> octopus']
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Updating sysctl_file: /etc/sysctl.d/50-ceph-osd-charm.conf values: {'kernel.pid_max': 2097152, 'vm.max_map_count': 524288, 'kernel.threads-max': 2097152, 'vm.vfs_cache_pressure': 100, 'vm.swappiness'
: 1}

FYI, content of 'osd-devices' is `/dev/disk/by-dname/bcache2 /dev/disk/by-dname/bcache3 /dev/disk/by-dname/bcache4 /dev/disk/by-dname/bcache5 /dev/disk/by-dname/bcache6 /dev/disk/by-dname/bcache7`.

Some logs that maybe interesting from the unit:
2021-06-29 02:19:28 DEBUG juju.worker.uniter.remotestate watcher.go:427 got application change
2021-06-29 02:19:28 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.remotestate watcher.go:448 got config change: ok=true, hashes=[26a4c44f83521054789f1277431b1abc01b2cb26aa783a3dbd32246ec9e558ea]
2021-06-29 02:20:00 DEBUG juju.worker.uniter resolver.go:147 no operations in progress; waiting for changes
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:59 running operation run config-changed hook
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for uniter (run config-changed hook)
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 preparing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter.operation executor.go:90 executing operation "run config-changed hook"
2021-06-29 02:20:00 DEBUG juju.worker.uniter agent.go:20 [AGENT-STATUS] executing: running config-changed hook
2021-06-29 02:20:00 DEBUG juju.worker.uniter.runner runner.go:595 starting jujuc server  {unix @/var/lib/juju/agents/unit-ceph-osd-0/agent.socket <nil>}
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Hardening function 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "config-get"
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log No hardening applied to 'config_changed'
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log old_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 INFO juju-log new_version: luminous
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 ERROR juju-log Invalid upgrade path from luminous to luminous.  Valid paths are: ['firefly -> hammer', 'hammer -> jewel', 'jewel -> luminous', 'luminous -> mimic', 'mimic -> nautilus', 'nautilus -> octopus']
2021-06-29 02:20:01 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:01 DEBUG juju-log Updating sysctl_file: /etc/sysctl.d/50-ceph-osd-charm.conf values: {'kernel.pid_max': 2097152, 'vm.max_map_count': 524288, 'kernel.threads-max': 2097152, 'vm.vfs_cache_pressure': 100, 'vm.swappiness'
: 1}

2021-06-29 02:20:02 DEBUG juju-log got journal devs: {'/dev/disk/by-dname/nvme0n1-part3'}
2021-06-29 02:20:02 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:02 INFO juju-log Skipping osd devices previously processed by this unit: ['/dev/disk/by-dname/bcache7', '/dev/disk/by-dname/bcache2', '/dev/disk/by-dname/bcache3', '/dev/bcache4', '/dev/disk/by-dname/bcache4', '/dev/d
isk/by-dname/bcache5']
2021-06-29 02:20:03 DEBUG config-changed   Failed to find physical volume "/dev/bcache2".
2021-06-29 02:20:04 DEBUG worker.uniter.jujuc server.go:204 running hook tool "juju-log"
2021-06-29 02:20:04 DEBUG juju-log Checking for pristine devices: "['/dev/disk/by-dname/bcache6']"

~# ls -lh /dev/disk/by-dname
total 0
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache0 -> ../../bcache4
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache1 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache2 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache3 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache4 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache5 -> ../../bcache1
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache6 -> ../../bcache2
lrwxrwxrwx 1 root root 13 Jun 22 21:36 bcache7 -> ../../bcache7
lrwxrwxrwx 1 root root 15 Jun 22 21:36 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jun 22 21:36 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jun 22 21:36 nvme0n1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 10 Jun 22 21:36 sda-part1 -> ../../sda1

~# pvs
  PV                                                     VG                                                Fmt  Attr PSize    PFree   
  /dev/mapper/crypt-17cafad7-8464-4d69-978e-a002cfb76b45 ceph-17cafad7-8464-4d69-978e-a002cfb76b45         lvm2 a--    <3.64t       0 
  /dev/mapper/crypt-227cdccb-9ce7-4f89-8d67-5c553d6d1558 ceph-227cdccb-9ce7-4f89-8d67-5c553d6d1558         lvm2 a--    <3.64t       0 
  /dev/mapper/crypt-2e53bfba-b064-4a5e-85a2-4de2d5deff18 ceph-2e53bfba-b064-4a5e-85a2-4de2d5deff18         lvm2 a--    <3.64t       0 
  /dev/mapper/crypt-4e30dbe6-3f8b-439b-95f1-148c0624c464 ceph-4e30dbe6-3f8b-439b-95f1-148c0624c464         lvm2 a--    <3.64t       0 
  /dev/mapper/crypt-81dbd78f-6380-4a4a-872b-0d2fe80195ee ceph-81dbd78f-6380-4a4a-872b-0d2fe80195ee         lvm2 a--    <3.64t       0 
  /dev/mapper/crypt-dc77eb48-09d7-45ff-ab71-54cbc7d46749 ceph-journal-4cf71488-fdee-4ca6-8a36-a9e748b9d788 lvm2 a--  <931.32g <926.32g
  /dev/mapper/crypt-fdb106e4-bd52-4aa9-b63a-ad95094ea329 ceph-fdb106e4-bd52-4aa9-b63a-ad95094ea329         lvm2 a--    <3.64t       0

crypt-2e53bfba-b064-4a5e-85a2-4de2d5deff18 is /dev/sdd, /dev/bcache2 and therefore /dev/disk/by-dname/bcache6

Revision history for this message

Celia Wang (ziyiwang) wrote on 2021-06-29:

ceph-osd unit logs after run "blacklist-add-disks":
https://pastebin.canonical.com/p/QkM4nHSwdt/

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2021-06-29:

I'm a little confused about the bug report. Please could you include the complete juju status (so se can see the machines), and the logs from an affected unit. Also, the commands (and their order) that were used to during the upgrade to do the upgrade.

Thanks.

Changed in charm-ceph-osd:
status:	New → Incomplete

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2021-06-29:

The charm's osd-devices configuration is the following (which uses udev rules to provide static bcache naming paths, as bcaches rename themselves upon every boot/loading of the bcache module):

/dev/disk/by-dname/bcache2 /dev/disk/by-dname/bcache3 /dev/disk/by-dname/bcache4 /dev/disk/by-dname/bcache5 /dev/disk/by-dname/bcache6 /dev/disk/by-dname/bcache7

I queried the unit state database and found that the osd-devices key had the following data:

["/dev/disk/by-dname/bcache7", "/dev/disk/by-dname/bcache2", "/dev/disk/by-dname/bcache3", "/dev/bcache4", "/dev/disk/by-dname/bcache4", "/dev/disk/by-dname/bcache5"]

On the host, osd-device /dev/disk/by-dname/bcache6 is a symlink to /dev/bcache4.

When config-changed was run on ceph-osd 21.04 charms, it was trying to configure /dev/disk/by-dname/bcache6 because it did not exist in the osd-devices in unitdata.kv (is my hypothesis). Because this disk WAS in use and configured properly (as /dev/bcache4), this non-pristine error was incorrect.

Manually updating the state database with:
sqlite> update kv set data='["/dev/disk/by-dname/bcache7", "/dev/disk/by-dname/bcache2", "/dev/disk/by-dname/bcache3", "/dev/disk/by-dname/bcache6", "/dev/disk/by-dname/bcache4", "/dev/disk/by-dname/bcache5"]' where key='osd-devices';

After that, running hooks/config-changed cleared the issue for this node.

So, I believe that it may be worth having the charm check if the given osd-device path is a symlink to an already configured osd-device.

Changed in charm-ceph-osd:
status:	Incomplete → Confirmed

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2021-06-29:

It might be useful for list-disks to provide a list of known/configured osd-devices from the kv store for troubleshooting issues like this in the future.

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2021-06-29:

Here's an example:

root@ceph-osd-2:/var/lib/juju/agents/unit-ceph-osd-2/charm# ls -al /dev/disk/by-dname/bca*
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache0 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache1 -> ../../bcache7
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache2 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache3 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache4 -> ../../bcache4
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache5 -> ../../bcache2
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache6 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jun 24 21:54 /dev/disk/by-dname/bcache7 -> ../../bcache1
root@ceph-osd-2:/var/lib/juju/agents/unit-ceph-osd-2/charm# sqlite3 .unit-state.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> select data from kv where key='osd-devices';
["/dev/disk/by-dname/bcache7", "/dev/bcache0", "/dev/bcache1", "/dev/bcache5", "/dev/bcache2", "/dev/bcache4"]

This one is even tougher. I'm guessing when this was deployed, we used /dev/bcacheX, and because those got renamed upon each boot, we created the udev rules for /dev/disk/by-dname paths and then triggered this issue. As you can see ehre, some of those /dev/bcacheX's don't map to the current booted host's /dev/disk/by-dname/bcache[2-7] which is what was defined in osd-devices, so I don't even think you could use that methodology to trace this.

I think ultimately, storing /dev/bcacheX paths in osd-devices is futile, as they are renamed on each boot, and some other stragegy will be needed to ensure the charm doesn't reconfigure a disk already known, and that it knows that the disks that are configured were configured by itself. Perhaps use of the bcache UUID of the device would help.

Revision history for this message

Moises Emilio Benzan Mora (moisesbenzan) wrote on 2022-03-30:

FWIW, we hit this on a non-upgrade path at this test run: https://solutions.qa.canonical.com/testruns/testRun/733b0d5b-8cf7-45e0-ac78-24ca52a93260

Jeffrey Chang (modern911) on 2022-11-26

tags:

added: cdo-qa

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.