Partition created but device doesn't exist (yet?)

Bug #1557158 reported by Andreas Hasenack
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Triaged
Low
Unassigned
ceph-osd (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

cs:~openstack-charmers-next/trusty/ceph-osd-81

I wonder if there is a race between partitioning a disk and expecting the corresponding devices to be created. Case in point:
unit-ceph-osd-4[27573]: 2016-03-14 11:36:10 INFO unit.ceph-osd/4.juju-log server.go:268 mon:72: osdize cmd: ['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']
unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 Creating new GPT entries.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 GPT data structures destroyed! You may now partition the disk using fdisk or
unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 other utilities.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:16 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 Creating new GPT entries.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:16 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:18 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 mkfs.xfs: cannot open /dev/sdb1: No such device or address
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdb1']' returned non-zero exit status 1
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 ERROR unit.ceph-osd/4.juju-log server.go:268 mon:72: Unable to initialize device: /dev/sdb
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 Traceback (most recent call last):
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/mon-relation-changed", line 312, in <module>
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 hooks.execute(sys.argv)
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 self._hooks[hook_name]()
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/mon-relation-changed", line 254, in mon_relation
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 prepare_disks_and_activate()
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/mon-relation-changed", line 185, in prepare_disks_and_activate
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 config('osd-encrypt'))
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/ceph.py", line 435, in osdize
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 reformat_osd, ignore_errors, encrypt)
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-4/charm/hooks/ceph.py", line 490, in osdize_dev
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 raise e
unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 subprocess.CalledProcessError: Command '['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']' returned non-zero exit status 1

/dev/sdb is a normal rotary disk on this node. Partitioning went fine at :14 and :16, but at timestamp :20 mkfs complained that /dev/sdb1 does not exist. Should we wait there perhaps?

This is from https://ci.lscape.net/job/landscape-system-tests/1366/

Tags: landscape
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Chris Holcombe (xfactor973) wrote :

Well this is interesting. I'll comb through the code and see if I can figure out what is going on. If possible could you run "sgdisk -p /dev/sdb" to print out the partition table. I'm curious what it looks like.

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: [Bug 1557158] Re: Partition created but device doesn't exist (yet?)

That environment isn't available anymore, it was a CI run. I have other
logs, like syslog, or anything under /var/log in any unit really, if that
helps.
On Mar 14, 2016 6:00 PM, "Chris Holcombe" <email address hidden>
wrote:

> Well this is interesting. I'll comb through the code and see if I can
> figure out what is going on. If possible could you run "sgdisk -p
> /dev/sdb" to print out the partition table. I'm curious what it looks
> like.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1557158
>
> Title:
> Partition created but device doesn't exist (yet?)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/charms/+source/ceph-osd/+bug/1557158/+subscriptions
>

Revision history for this message
Francis Ginther (fginther) wrote :

I think I hit another case of this with cs:trusty/ceph-osd-120.

This is also from a CI run: https://ci.lscape.net/job/landscape-system-tests/1441/

The logs will persist for about 20 days.

[log snippet]
2016-03-18 11:46:32 INFO mon-relation-changed Creating new GPT entries.
2016-03-18 11:46:32 INFO mon-relation-changed The operation has completed successfully.
2016-03-18 11:46:34 INFO mon-relation-changed The operation has completed successfully.
2016-03-18 11:46:36 INFO mon-relation-changed The operation has completed successfully.
2016-03-18 11:46:37 INFO mon-relation-changed mkfs.xfs: cannot open /dev/sdb1: No such device or address
2016-03-18 11:46:37 INFO mon-relation-changed ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdb1']' returned non-zero exit status 1
2016-03-18 11:46:37 ERROR juju-log mon:72: Unable to initialize device: /dev/sdb
2016-03-18 11:46:37 INFO mon-relation-changed Traceback (most recent call last):
2016-03-18 11:46:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-2/charm/hooks/mon-relation-changed", line 260, in <module>
2016-03-18 11:46:37 INFO mon-relation-changed hooks.execute(sys.argv)
2016-03-18 11:46:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-2/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
2016-03-18 11:46:37 INFO mon-relation-changed self._hooks[hook_name]()
2016-03-18 11:46:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-2/charm/hooks/mon-relation-changed", line 201, in mon_relation
2016-03-18 11:46:37 INFO mon-relation-changed config('ignore-device-errors'))
2016-03-18 11:46:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-2/charm/hooks/ceph.py", line 401, in osdize
2016-03-18 11:46:37 INFO mon-relation-changed osdize_dev(dev, osd_format, osd_journal, reformat_osd, ignore_errors)
2016-03-18 11:46:37 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-2/charm/hooks/ceph.py", line 450, in osdize_dev
2016-03-18 11:46:37 INFO mon-relation-changed raise e
2016-03-18 11:46:37 INFO mon-relation-changed subprocess.CalledProcessError: Command '['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']' returned non-zero exit status 1
2016-03-18 11:46:37 ERROR juju.worker.uniter.operation runhook.go:107 hook "mon-relation-changed" failed: exit status 1

Revision history for this message
Francis Ginther (fginther) wrote :
description: updated
Revision history for this message
Chris Holcombe (xfactor973) wrote :

Yeah it shouldn't be hard to put in a poll to wait a little bit of time
for the partition to show up. We can also do a rescan of the scsi bus
to see if that prompts it to show up. It's odd that the gpt command is
returning before it is complete and settled.

On 03/21/2016 11:21 AM, Francis Ginther wrote:
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:10 INFO unit.ceph-osd/4.juju-log server.go:268 mon:72: osdize cmd: ['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 Creating new GPT entries.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 GPT data structures destroyed! You may now partition the disk using fdisk or
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:14 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 other utilities.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:16 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 Creating new GPT entries.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:16 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:18 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 The operation has completed successfully.
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 mkfs.xfs: cannot open /dev/sdb1: No such device or address
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 INFO unit.ceph-osd/4.mon-relation-changed logger.go:40 ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdb1']' returned non-zero exit status 1
> unit-ceph-osd-4[27573]: 2016-03-14 11:36:20 ERROR unit.ceph-osd/4.juju-log server.go:268 mon:72: Unable to initialize device: /dev/sdb

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Download full text (4.1 KiB)

Happened again, this time with cs:trusty/ceph-osd-233:

unit-ceph-osd-0[9443]: 2016-05-01 22:09:21 INFO unit.ceph-osd/0.juju-log server.go:268 mon:44: osdize cmd: ['ceph-disk', 'prepare', '--fs-type', u'xfs', '--zap-disk', u'/dev/sdb']
unit-ceph-osd-0[9443]: 2016-05-01 22:09:22 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 Creating new GPT entries.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:22 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 GPT data structures destroyed! You may now partition the disk using fdisk or
unit-ceph-osd-0[9443]: 2016-05-01 22:09:22 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 other utilities.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:24 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 Creating new GPT entries.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:24 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:25 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:26 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 The operation has completed successfully.
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 mkfs.xfs: cannot open /dev/sdb1: No such device or address
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdb1']' returned non-zero exit status 1
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 ERROR unit.ceph-osd/0.juju-log server.go:268 mon:44: Unable to initialize device: /dev/sdb
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 Traceback (most recent call last):
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 545, in <module>
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 hooks.execute(sys.argv)
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 self._hooks[hook_name]()
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 475, in mon_relation
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 prepare_disks_and_activate()
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 398, in prepare_disks_and_activate
unit-ceph-osd-0[9443]: 2016-05-01 22:09:27 INFO unit.ceph-osd/0.mon-relation-changed logger.go:40 config('osd-encrypt'))
unit-ceph-osd...

Read more...

tags: removed: kanban-cross-team
Revision history for this message
James Page (james-page) wrote :

"udevadm settle" might do the trick to ensure devices are present before use.

Revision history for this message
James Page (james-page) wrote :

That said this error is coming from within ceph-disk, so we need to fix there if possible.

Revision history for this message
James Page (james-page) wrote :

Switching to Ubuntu package for bug tracking (not a charm specific issue).

Changed in ceph-osd (Juju Charms Collection):
status: New → Invalid
Changed in ceph (Ubuntu):
importance: Undecided → Low
James Page (james-page)
Changed in ceph (Ubuntu):
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.