deploy is failed. A lot of ceph OSDs are in down state.

Bug #1462451 reported by Leontiy Istomin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Stanislav Makar
6.1.x
In Progress
High
MOS Maintenance
7.0.x
Fix Released
High
Stanislav Makar

Bug Description

deployment has been failed with the following error: http://paste.openstack.org/show/266476/
from astute: http://paste.openstack.org/show/266504/
ceph health_warn: http://paste.openstack.org/show/266512/

[root@node-31 ~]# ceph osd tree | grep up | wc -l
60
[root@node-31 ~]# ceph osd tree | grep down | wc -l
142
ceph osd tree | grep down: http://paste.openstack.org/show/266526/
ceph osd tree | grep up: http://paste.openstack.org/show/266527/

configuration:
Baremetal,Centos,IBP,HA,Neutron-gre,Ceph-all,Nova-debug,Nova-quotas,6.1_497
Controllers:3 Computes:200
applied the folowing fixes:
https://review.openstack.org/#/c/188555/
https://review.openstack.org/#/c/188171/
https://review.openstack.org/#/c/187801/

api: '1.0'
astute_sha: cbae24e9904be2ff8d1d49c0c48d1bdc33574228
auth_required: true
build_id: 2015-06-02_16-28-25
build_number: '497'
feature_groups:
- mirantis
fuel-library_sha: d757cd41e4f8273d36ef85b8207e554e5422c5b0
fuel-ostf_sha: f899e16c4ce9a60f94e7128ecde1324ea41d09d4
fuelmain_sha: bcc909ffc5dd5156ba54cae348b6a07c1b607b24
nailgun_sha: 3830bdcb28ec050eed399fe782cc3dd5fbf31bde
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 4fc55db0265bbf39c369df398b9dc7d6469ba13b
release: '6.1'

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-06-05_17-41-52.tar.xz

Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Library Team (fuel-library)
Dmitry Ilyin (idv1985)
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Ryan Moe (rmoe) wrote :

Could you please provide a diagnostic snapshot?

Revision history for this message
Dmitry Ilyin (idv1985) wrote :

First, about one third of the Ceph nodes were down. After "service ceph restart" they went up.

Not this scripts dunps inactive PGs and find a lot of them.
> There are PGs which are not in active state!

Revision history for this message
Ryan Moe (rmoe) wrote :
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> configuration: Baremetal

cat /proc/cpuinfo

sudo lspci -vvv

description: updated
summary: - deploy is failed. A lot of ceph OSDs is in down state.
+ deploy is failed. A lot of ceph OSDs are in down state.
Revision history for this message
Leontiy Istomin (listomin) wrote :

lspci and cpuinfo from two types of nodes in the env

Revision history for this message
Leontiy Istomin (listomin) wrote :

has been reproduced with 511 ISO and configuration:
Baremetal,Centos,IBP,HA,Neutron-gre,Ceph-all,Nova-debug,Nova-quotas,6.1_511
Controllers:3 Computes:200
Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-06-07_22-52-31.tar.xz

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmytro Iurchenko (diurchenko)
Revision history for this message
Dmytro Iurchenko (diurchenko) wrote :

The attempt to reproduce the bug with significantly fewer number of placement groups is in progress.

Revision history for this message
Stanislav Makar (smakar) wrote :

We have successfully deployed with decreased number of pg_num (128 per pool ).
Then we increased it manually -- all is working.

So as we see in our case ceph cluster can not handle PGs creation during adding new OSDs due to big number of pg_num and OSDs :). To investigate why we need this cluster and hence more time.

Now we have some quick solutions:
1. Mykola Golub says that we have incorrect calculation formula for pg_num and he already has blueprint(https://blueprints.launchpad.net/fuel/+spec/ceph-osd-pool-default-pg-num).
  We would like to try to deploy with Mykolas formula and see whether it fixes the problem.
2. Also we have an option to postpone ceph pool creation to time when deployment of ceph OSDs is finished (post deployment task). It cant prevent ceph cluster from handling cases when OSD are added and PGs are greated.
We would like to try it too.

If we do it we will make decision and prepare the patch.

Revision history for this message
Mykola Golub (mgolub) wrote :

The cluster hanged with many placement groups stuck in state creating, with the following errors in logs:

2015-06-07 22:57:48.238453 7fd1124fe700 0 log [WRN] : slow request 2571.770231 seconds old, received at 2015-06-07 22:14:56.468107: osd_pg_create(pg0.1c,.... pg6.1ed0,5; ) v2 currently wait for new map

The problem was not reproduced when we hardcoded osd_pool_default_pg_num and osd_pool_default_pgp_num to 128 instead of allowing fuel to calculate it based on number of OSDs (8192 for the cluster of this size).

Although the root cause of the hang is not found (it might be a limit/timeout we stepped in ceph or OS , when large amount of placement groups are being created), there are some improvements for fuel that should improve the situation when deploying large clusters:

1) The formula for calculating pg number should be changed, giving values 10 times lower than currently for large clusters. Apart this issue, the overestimated number of pgs causes other issues, while the pg number is impossible to decrease after the pool creation:

  https://blueprints.launchpad.net/fuel/+spec/ceph-osd-pool-default-pg-num

2) Pools are created after controller nodes are deployed, but before OSDs are deployed. As a result if the pools have large pg num, huge number of PGs are in creating state, then OSD nodes are staring to be added, large number of PGs being created on the nodes that are deployed first, then new OSD appears and PGs should be moved. This process is not optimal and it is much less stressful for the cluster to create PGs after all OSDs are deployed and in IN and UP state, so no reballancing will ok and "early" ODS are not overloaded with placement groups.

Revision history for this message
Dmytro Iurchenko (diurchenko) wrote :

Stanislav Makar is going to try out the second way (create pools after the OSD nodes are added).
If it won't work out, then the formula of PG num calculation will be altered as Mykola Golub said.

Revision history for this message
Leontiy Istomin (listomin) wrote :

Reproduced with Ubuntu
Baremetal,Ubuntu,IBP,HA,Neutron-gre,Ceph-all,Nova-debug,Nova-quotas,6.1_521
Controllers:3 Computes:200

root@node-37:~# ceph -s
    cluster 6292c17b-39a9-45c9-9a01-6161eb72f816
     health HEALTH_WARN 234 pgs peering; 32872 pgs stuck inactive; 32872 pgs stuck unclean
     monmap e3: 3 mons at {node-37=192.168.0.40:6789/0,node-42=192.168.0.45:6789/0,node-56=192.168.0.59:6789/0}, election epoch 6, quorum 0,1,2 node-37,node-42,node-56
     osdmap e68: 200 osds: 200 up, 200 in
      pgmap v461: 32960 pgs, 7 pools, 0 bytes data, 0 objects
            407 GB used, 181 TB / 181 TB avail
                  33 inactive
               32605 creating
                  66 peering
                 168 creating+peering
                  88 active+clean

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-06-10_16-39-02.tar.xz

Changed in fuel:
assignee: Dmytro Iurchenko (diurchenko) → Stanislav Makar (smakar)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/190953

Revision history for this message
Mykola Golub (mgolub) wrote :

Moving pool creation to later stage (after osd are added) is a right change.

Still I think decreasing ceph default value for pool placement groups is also important. I filled a separate bug for this change:

https://bugs.launchpad.net/fuel/+bug/1464656

tags: added: 6.1rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/190189
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0b0d8d8b1182c97276a32d0fb80d2f382ed79a78
Submitter: Jenkins
Branch: master

commit 0b0d8d8b1182c97276a32d0fb80d2f382ed79a78
Author: Stanislav Makar <email address hidden>
Date: Wed Jun 10 13:15:16 2015 +0000

    Fix the problem with ceph deployment on scale lab

    Postpone ceph pool creation to post deploy:
    * Add task for ceph pool creation and put it in post deploy
    * Change ceph/compute.pp and move to post deploy
    * Remove from ceph/manifests/init.pp the pool creation code

    Closes-bug: #1462451
    Change-Id: Iee72e5f8e59c3ced0ba0d7f971380e5932cbb0fc

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/6.1)

Reviewed: https://review.openstack.org/190953
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=43b25e4b200c5b994cde81439454d6e2e908a88f
Submitter: Jenkins
Branch: stable/6.1

commit 43b25e4b200c5b994cde81439454d6e2e908a88f
Author: Stanislav Makar <email address hidden>
Date: Wed Jun 10 13:15:16 2015 +0000

    Fix the problem with ceph deployment on scale lab

    Postpone ceph pool creation to post deploy:
    * Add task for ceph pool creation and put it in post deploy
    * Change ceph/compute.pp and move to post deploy
    * Remove from ceph/manifests/init.pp the pool creation code

    Closes-bug: #1462451
    Change-Id: Iee72e5f8e59c3ced0ba0d7f971380e5932cbb0fc

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

This should not have been merged to stable/6.1, the change is too disruptive for post-hard-code-freeze. Please revert and merge a fix for bug #1464656 instead (change pg_num calculation).

Updated the 7.0.x status to Fix Committed reflect the fact that this was merged to master branch, not just stable/6.1.

Revision history for this message
Stanislav Makar (smakar) wrote :

@dborodaenko before try this fix we tried 1024 per pool and it did not help, it was the same
Then we tried this patch, it fixed the problem even with too high pg_num

QA folks can prove it

Revision history for this message
Dan Hata (dhata) wrote :

For Eugene Bogdanov

Clear steps to reproduce and expected result vs actual result
Deployment of ceph nodes through Fuel with more than 200 drives will fail.

Rough estimate of the probability of user facing the issue
This works fine with 50 drives but fails with 200 drives. We have not tested the number in between. We do know that it is 100% reproducible.

What is the real user facing impact / severity and is there a workaround available?
IMPACT: The user will experience a failed Ceph deployment.
WORKAROUND: The user can manually configure the placement groups and deployment to get around this.

Can we deliver the fix later and apply it easy on running env?
 Yes, we are first experimenting with deploying the minimal number of placement groups by dividing the number by 10. We are also exploring a more complex issue that will delay the creation of placement groups. This fix will take more testing and though.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The patch https://review.openstack.org/192919/ reverts https://review.openstack.org/190953, hence returning original issue to the confirmed state again for the 6.1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/193076

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Targeted to 6.1-updates because 6.1 GA was rolled out yesterday. The fix should be accompanied with errata information and go through full patching process.

tags: added: 6.1-mu-1
Revision history for this message
Stanislav Makar (smakar) wrote :

new patches are here
https://review.openstack.org/#/q/I05b53042e24da8cb1693049bd95e682c8903c812,n,z
waiting to be tested on scale lab

Changed in fuel:
assignee: Leontiy Istomin (listomin) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Viktoria Efimova (vefimova) wrote :

TESTED: Deployment with 200 Ceph nodes setting passed successfuly with applied patch.

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Viktoria, Last time we ran into issues on operation phase. Live migration didn't work. That was the reason why the patch was rejected from 6.1. Could you test functionality like live migration or ephemeral storage to ensure ceph works as expected. Thanks a lot.

Revision history for this message
Leontiy Istomin (listomin) wrote :

We performed boot_and_migrate_server and boot_server_from_volume_and_live_migrate rally scenarios with the fix. This tests have passed successfully. rally report and rally logs are attached

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Stanislav Makar (smakar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Stanislav Makar (<email address hidden>) on branch: master
Review: https://review.openstack.org/198735
Reason: this patch is included into
https://review.openstack.org/#/c/195468/2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/195468
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bf5ec482cfdf6ec412a5c7685113936e750f582a
Submitter: Jenkins
Branch: master

commit bf5ec482cfdf6ec412a5c7685113936e750f582a
Author: Stanislav Makar <email address hidden>
Date: Wed Jun 10 13:15:16 2015 +0000

    Fix the problem with ceph deployment on scale lab

    Postpone ceph pool creation to post deploy:
    * Add task for ceph pool creation and put it in post deploy
    * Change ceph/compute.pp and move to post deploy
    * Remove from ceph/manifests/init.pp the pool creation code
    * Add NOOP tests

    Change-Id: I05b53042e24da8cb1693049bd95e682c8903c812
    Closes-bug: #1462451

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/193076
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

tags: added: on-verification
Revision history for this message
Alexander Arzhanov (aarzhanov) wrote :
Download full text (3.3 KiB)

Verified on ISO#286:

api: '1.0'
astute_sha: 8283dc2932c24caab852ae9de15f94605cc350c6
auth_required: true
build_id: '286'
build_number: '286'
feature_groups:
- mirantis
fuel-agent_sha: 082a47bf014002e515001be05f99040437281a2d
fuel-library_sha: ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589
fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
fuel-ostf_sha: 1f08e6e71021179b9881a824d9c999957fcc7045
fuelmain_sha: 9ab01caf960013dc882825dc9b0e11ccf0b81cb0
nailgun_sha: 5c33995a2e6d9b1b8cdddfa2630689da5084506f
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 1ce8ecd8beb640f2f62f73435f4e18d1469979ac
release: '7.0'
release_versions:
  2015.1.0-7.0:
    VERSION:
      api: '1.0'
      astute_sha: 8283dc2932c24caab852ae9de15f94605cc350c6
      build_id: '286'
      build_number: '286'
      feature_groups:
      - mirantis
      fuel-agent_sha: 082a47bf014002e515001be05f99040437281a2d
      fuel-library_sha: ff63a0bbc93a3a0fb78215c2fd0c77add8dfe589
      fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
      fuel-ostf_sha: 1f08e6e71021179b9881a824d9c999957fcc7045
      fuelmain_sha: 9ab01caf960013dc882825dc9b0e11ccf0b81cb0
      nailgun_sha: 5c33995a2e6d9b1b8cdddfa2630689da5084506f
      openstack_version: 2015.1.0-7.0
      production: docker
      python-fuelclient_sha: 1ce8ecd8beb640f2f62f73435f4e18d1469979ac
      release: '7.0'

#########################################
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|--------|------------------|---------|------------|-------------------|----------------------|---------------|--------|---------
4 | ready | Untitled (23:2b) | 1 | 10.109.0.7 | 64:98:a9:2b:23:2b | ceph-osd, compute | | True | 1
5 | ready | Untitled (55:72) | 1 | 10.109.0.5 | 64:c6:67:35:55:72 | ceph-osd, compute | | True | 1
2 | ready | Untitled (47:7a) | 1 | 10.109.0.6 | 64:73:7e:a0:47:7a | ceph-osd, controller | | True | 1
3 | ready | Untitled (5b:b0) | 1 | 10.109.0.3 | 64:6b:8a:5b:5b:b0 | ceph-osd, controller | | True | 1
1 | ready | Untitled (41:ce) | 1 | 10.109.0.4 | 64:d3:e7:8d:41:ce | ceph-osd, controller | | True | 1
#########################################

#########################################
root@node-1:~# ceph osd tree
# id weight type name up/down reweight
-1 0.3499 root default
-2 0.04999 host node-3
0 0.04999 osd.0 up 1
-3 0.04999 host node-2
1 0.04999 osd.1 up 1
-4 0.04999 host node-1
2 0.04999 osd.2 up 1
-5 0.09998 host node-5
3 0.04999 osd.3 up 1
5 0.04999 osd.5 up 1
-6 0.09998 host node-4
4 0.04999 osd.4 up 1
6 0.04999 osd.6 up 1
#########################################

#########################################
root@node-1:~# ceph -s
    cluster 6778a6a6-09f6-4e31-a7cd-33e80ea8a806
     health HEALTH_OK
     monmap e3: 3 mons at {node-1=10.109.2.4:6789/0,node-2=10.109.2.6:6789/0,node-3=10.109.2.5:6789/0}, election epoch 4, quorum 0,1,2 node-1,node-3,node-2
     osdmap e39: 7 osds: 7 up, 7 in...

Read more...

tags: removed: on-verification
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Reassigned to maintenance team to include into 6.1 maintenance updates. Since verification of the fix requires scale lab we will be able to work on it only after 7.0 release (scale lab is 100% booked with 7.0 testing now).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/193076
Reason: This branch (stable/6.1) is at End Of Life

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.