CephFS Native driver gets blacklisted during startup with Ceph Octopus

Bug #1914453 reported by Stig Telfer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Triaged
Undecided
Unassigned

Bug Description

We are using the following:
- CephFSNative driver
- Ussuri Manila with Nautilus client libraries
- Ceph Octopus 15.2.8 server

What I see is that the manila-share service is getting blacklisted in the Ceph MDS on startup, which renders it inoperable eg for share creation.

Here's an example timeline:

Manila share driver starts and evicts previous clients as part of startup:

2021-02-03 15:15:17.186 20 DEBUG ceph_volume_client [-] mds_command: 7138235, ['session', 'evict', 'auth_name=...'] _evict /usr/lib/python3.6/site-packages/ceph_volume_client.py:166
2021-02-03 15:15:18.243 20 DEBUG ceph_volume_client [-] mds_command: complete 0 _evict /usr/lib/python3.6/site-packages/ceph_volume_client.py:174
2021-02-03 15:15:18.244 20 INFO ceph_volume_client [req-2481ee83-1105-4cef-8e55-a9cc340219b3 - - - - -] evict: joined all
2021-02-03 15:15:18.244 20 DEBUG ceph_volume_client [req-2481ee83-1105-4cef-8e55-a9cc340219b3 - - - - -] Premount eviction of manila completes _connect /usr/lib/python3.6/site-packages/ceph_volume_client.py:491

Ceph MDS sees this and does the eviction, but also blacklists:

2021-02-03T15:15:52.636+0000 7f9203f49700 1 mds.xxx asok_command: session evict {filters=[auth_name=...],prefix=session evict} (starting...)
2021-02-03T15:15:52.636+0000 7f9203f49700 1 mds.0.287 Evicting (and blacklisting) client session 7207716 (p.q.r.s:0/2920427123)
2021-02-03T15:15:52.636+0000 7f9203f49700 0 log_channel(cluster) log [INF] : Evicting (and blacklisting) client session 7207716 (p.q.r.s:0/2920427123)
2021-02-03T15:15:53.476+0000 7f9204f4b700 0 --2- [v2:a.b.c.d:6800/1138805783,v1:a.b.c.d:6801/1138805783] >> p.q.r.s:0/2920427123 conn(0x555791941800 0x5557917a6800 crc :-1 s=SESSION_ACCEPTING pgs=6 cs=0 l=0 rev1=1 rx=0 tx=0).handle_reconnect no existing connection exists, reseting client

The manila-share ceph client logs also record this:

2021-02-03 15:15:53.475 7f641cff9700 0 client.7207716 ms_handle_remote_reset on v2:a.b.c.d:6800/1138805783
2021-02-03 15:15:53.476 7f641cff9700 -1 client.7207716 I was blacklisted at osd epoch 12942

Subsequent attempt at share creation fails in manila-share:

2021-02-03 15:17:56.314 20 DEBUG manila.share.drivers.cephfs.driver [req-495c5115-5a0a-4465-bc6d-0fdb1caaac92 b2d76137b21489d3fbe0125f36cd8a92ddca0018ef8f265c8f3f9fdc6efcb191 25f96bca327c4136ab28f251203d71a3 - - -] create_share xxxx name=081a69e2-e80b-454c-880b-789ca6f70851 size=10 share_group_id=None create_share /usr/lib/python3.6/site-packages/manila/share/drivers/cephfs/driver.py:262
2021-02-03 15:17:56.324 20 INFO ceph_volume_client [req-495c5115-5a0a-4465-bc6d-0fdb1caaac92 b2d76137b21489d3fbe0125f36cd8a92ddca0018ef8f265c8f3f9fdc6efcb191 25f96bca327c4136ab28f251203d71a3 - - -] create_volume: /volumes/_nogroup/081a69e2-e80b-454c-880b-789ca6f70851
2021-02-03 15:17:56.324 20 ERROR manila.share.manager [req-495c5115-5a0a-4465-bc6d-0fdb1caaac92 b2d76137b21489d3fbe0125f36cd8a92ddca0018ef8f265c8f3f9fdc6efcb191 25f96bca327c4136ab28f251203d71a3 - - -] Share instance 081a69e2-e80b-454c-880b-789ca6f70851 failed on creation.: cephfs.OSError: error in stat: /volumes/_nogroup/081a69e2-e80b-454c-880b-789ca6f70851: Cannot send after transport endpoint shutdown [Errno 108]
2021-02-03 15:17:56.325 20 WARNING manila.share.manager [req-495c5115-5a0a-4465-bc6d-0fdb1caaac92 b2d76137b21489d3fbe0125f36cd8a92ddca0018ef8f265c8f3f9fdc6efcb191 25f96bca327c4136ab28f251203d71a3 - - -] Share instance information in exception can not be written to db because it contains {} and it is not a dictionary.: cephfs.OSError: error in stat: /volumes/_nogroup/081a69e2-e80b-454c-880b-789ca6f70851: Cannot send after transport endpoint shutdown [Errno 108]

The manila-share servers are blacklisted for cephfs:

# ceph osd blacklist ls
p.q.r.s:0/2920427123 2021-02-03T16:15:52.637798+0000
[...]

To work around this issue, we need to set non-default config in our Ceph cluster:

# ceph config set global mds_session_blacklist_on_evict false

(as described in https://docs.ceph.com/en/octopus/cephfs/eviction/#advanced-configuring-blacklisting)

Now things appear to be working again. Is there a way to do this without requiring configuration changes to the Ceph cluster, or is it something that will need adding as a documentation addition for the CephFSNative driver?

Vida Haririan (vhariria)
Changed in manila:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.