[library] ERR: ceph-deploy osd activate node-11:/dev/sdc4 returned 1 instead of one of [0]

Bug #1322230 reported by Anastasia Palkina
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
In Progress
High
Oleksiy Molchanov
6.0.x
In Progress
High
Oleksiy Molchanov

Bug Description

"build_id": "2014-05-22_01-10-31",
"mirantis": "yes",
"build_number": "216",
"ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d",
"nailgun_sha": "31fcb161ff8d6bfb861b041467440752c0e9c537",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "dda9cff27999e74b80e8e6a9e665e7e9677ab994",
"astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e",
"release": "5.0",
"fuellib_sha": "872b6a7a968b619a493ad46c504910020ea2edae"

1. Create new environment (CentOS, HA mode)
2. Choose Ceph for Glance
3. Choose Rados in network settings
4. Add 1 node with role controller+cinder+ceph
5. Start deployment. It was successful
6. There is error in puppet.log:

2014-05-22 14:06:15 ERR

 (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-11:/dev/sdc4 returned 1 instead of one of [0]

Tags: ceph ha
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

node-11.domain.tld/commands/blkid_o_list.txt indicates that despite ceph-deploy complaints sdc4 was actually successfully activated:

/dev/sdb4 xfs /var/lib/ceph/osd/ceph-0 68646967-4c87-433c-b1b8-3806c2f6544a
/dev/sdc4 xfs /var/lib/ceph/osd/ceph-1 0fef960a-aecc-4223-998c-e09fc0de3f91

and osd-1 is happily running:

root 23106 0.4 1.8 554516 35376 ? Ssl 14:07 0:11 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf

/var/log/ceph-osd.1.log also doesn't show any errors.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Looks like some sort of ceph-deploy bug, we'll need to revisit this once we upgrade Ceph to 0.80 and ceph-deploy to 1.5.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

looks like a dup of numerous 'ceph-deploy osd activate' open issues...

tags: added: ceph ha
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Dmitry Ilyin (idv1985)
summary: - ERR: ceph-deploy osd activate node-11:/dev/sdc4 returned 1 instead of
- one of [0]
+ [library] ERR: ceph-deploy osd activate node-11:/dev/sdc4 returned 1
+ instead of one of [0]
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #366
"build_id": "2014-07-28_02-01-14",
"ostf_sha": "8c328521b1444f22c50463b9432193e20ed33813",
"build_number": "366",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "83cc9ed44ebc8dd97248483b6d414ebbc4cff3c0",
"production": "docker",
"fuelmain_sha": "9adfbf5a52cedbdd16ec1a74f6c44c5b3419b87c",
"astute_sha": "aa5aed61035a8dc4035ab1619a8bb540a7430a95",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "d1c7f67b3cf51978d3178c8666ea398f2477dcb5"

1. Create new environment (Ubuntu, HA mode)
2. Choose VLAN segmentation
3. Choose both Ceph
4. Add 3 controllers+ceph, compute
5. Start deployment
6. There are errors in puppet.log on primary controller (node-1):

2014-07-28 13:53:06 ERR

 (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-1:/dev/sdb2 node-1:/dev/sdc2 returned 1 instead of one of [0]

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
importance: Medium → High
Ryan Moe (rmoe)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Ryan Moe (rmoe)
Revision history for this message
Ryan Moe (rmoe) wrote :

This failure is the same as in https://bugs.launchpad.net/fuel/+bug/1335880 (OSError exception related to ceph-disk cleaning up its tmp directory). Is there something about your environment that could cause this? We haven't been able to reproduce this particular issue.

Ryan Moe (rmoe)
Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :
Changed in fuel:
status: Incomplete → New
Changed in fuel:
status: New → Confirmed
Ryan Moe (rmoe)
Changed in fuel:
assignee: Ryan Moe (rmoe) → Fuel Library Team (fuel-library)
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

We're still unable to reproduce this locally, and the swarm environment where this happened got deleted before we could log in.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #3 for 5.1

"build_id": "2014-09-11_01-04-40", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "3", "auth_required": true, "api": "1.0", "nailgun_sha": "720e83bca37561fbc0452ad4e99f1f8cfe8e40cf", "production": "docker", "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["experimental"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-11_01-04-40", "ostf_sha": "1de6ed1c0b72f6687ffb4bebc2c939b135a88e34", "build_number": "3", "api": "1.0", "nailgun_sha": "720e83bca37561fbc0452ad4e99f1f8cfe8e40cf", "production": "docker", "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035", "astute_sha": "b622d9b36dbdd1e03b282b9ee5b7435ba649e711", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "6fc7ac9041894aa76b2e18d385149166e34f7b23"}}}, "fuellib_sha": "6fc7ac9041894aa76b2e18d385149166e34f7b23"

1. Create new environment (CentOS, HA mode)
2. Choose Ceph for Glance
3. Choose Rados in network settings
4. Add 1 node with role controller+cinder+ceph
5. Start deployment. It was successful
6. There is error in puppet.log (node-14):

2014-09-11 16:02:59 ERR

 (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-14:/dev/sdb4 node-14:/dev/sdc4 returned 1 instead of one of [0]

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Ryan Moe (rmoe)
Changed in fuel:
importance: High → Medium
Revision history for this message
Ryan Moe (rmoe) wrote :

Are you using VirtualBox when you run into this? We've never seen it with KVM.

When this issue occurs ceph-disk attempts to lazily unmount its temporary OSD mount point. It then immediately attempts to remove that tmp directory but the device is still busy. By the time I can get onto the environment nothing is accessing the device. Failing to remove an empty temporary directory shouldn't cause our deployment to fail. I'll attach a patch that will ignore this exception.

Revision history for this message
Ryan Moe (rmoe) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :

I'm using VBox for testing.

Changed in fuel:
milestone: 5.1 → 5.1.1
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #4 for 6.0

1. Create new environment (CentOS, HA mode)
2. Choose GRE neutron
3. Choose Ceph for volumes
4. Choose Sahara, Ceilometer
5. Add 2 controller+ceph, 1 conroller, 1 compute, 3 mongo
6. Start deployment. It has failed with error on controller+ceph (node-16):

2014-11-25 18:03:22 ERR

 (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-16:/dev/sdb4 node-16:/dev/sdc4 returned 1 instead of one of [0]

I'm using VBox

Logs are here: https://drive.google.com/a/mirantis.com/file/d/0B6SjzarTGFxaMDVIMnJQQkx0ams/view?usp=sharing

Revision history for this message
Anastasia Palkina (apalkina) wrote :

"build_id": "2014-11-24_22-41-00", "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622", "build_number": "4", "auth_required": true, "api": "1.0", "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703", "production": "docker", "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d", "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-24_22-41-00", "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622", "build_number": "4", "api": "1.0", "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703", "production": "docker", "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d", "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b"}}}, "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b"

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Bogdan, I will check asap.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

No, this bug is not related to the patch https://review.openstack.org/#/c/135337/

It makes sense to apply patch from Ryan. Also I am lowering to Medium, as it doesn't break the deployment.

Revision history for this message
Anastasia Palkina (apalkina) wrote :

It breaks deployment!
Deployment has failed!

Move bug back to High.

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Anastasia, please try this patch https://review.openstack.org/#/c/135337/.

Also please disregard my comment #20.

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Tested patch https://review.openstack.org/#/c/135337/ on ISO #7 for 6.0.

This patch fix the issue with ceph activate

Revision history for this message
Anastasia Palkina (apalkina) wrote :

"build_id": "2014-11-25_22-41-00", "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622", "build_number": "7", "auth_required": true, "api": "1.0", "nailgun_sha": "cbe7b96943d43397dc608a2f6c9dc1af14dd9a48", "production": "docker", "fuelmain_sha": "7db74b9f80180bf3936db1edc4aebfae310d024a", "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-25_22-41-00", "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622", "build_number": "7", "api": "1.0", "nailgun_sha": "cbe7b96943d43397dc608a2f6c9dc1af14dd9a48", "production": "docker", "fuelmain_sha": "7db74b9f80180bf3936db1edc4aebfae310d024a", "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "8c7eec6225184e0391569b2b5371196ab3e3fa19"}}}, "fuellib_sha": "8c7eec6225184e0391569b2b5371196ab3e3fa19"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.