Cinder scheduler's capacity filter is not working as expected when netapp_lun_space_reservation=enabled

Bug #1859188 reported by Naresh Kumar Gunjalli
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

The backend vserver is iscsi enabled and all the volumes available on this vserver will be considered as pools to provision new iscsi LUNs(Cinder volumes).
Consider there are 5 volumes in this iscsi enabled vserver.
• Vol1 : Free space 500 GB
• Vol2 : Free space 800 GB
• Vol3 : Free space 400 GB
• Vol4 : Free space 100 GB
• Vol5 : Free space 110 GB

When there is a request to create a new LUN of size 450 GB and we know that netapp_lun_space_reservation is enabled by default.
Cinder scheduler is currently whitelisting all the volumes to be used for provisioning a new LUN of size 450 GB.
But as netapp_lun_space_reservation is enabled, Cinder scheduler is supposed to blacklist Vol3,Vol4 and Vol5 as they will not be able to host a 450 GB LUN.
Cinder scheduler is checking only if volume_type extra specs has thin provisioning enabled or not, it is also supposed to check if thin provisioning is enabled for the LUNs as well or not.
If thin provisioning is enabled on volume_type extra specs, cinder scheduler is assuming that thin provisioning is enabled for LUNs as well and whitelisting the backend volume in the pool.
max_over_subscription_ratio is coming into picture in this scenario where thin provisioning is allowed to be 20x of the total physical capacity of the ontap volume.
This behaviour is causing the whitelisting of Vol3,Vol4 and Vol5 and when lun creation is tried with space-reservation enabled it is failing with “No space left on device” error.

This is the capacity filter code where its only checking if thin provisioning is enabled in the volume_type extra specs but not the netapp_lun_space_reservation on the backend.
https://github.com/openstack/cinder/blob/master/cinder/scheduler/filters/capacity_filter.py#L114

Information related to netapp_lun_space_reservation from our wiki’s.
https://netapp-openstack-dev.github.io/openstack-docs/rocky/cinder/configuration/cinder_config_files/unified_driver_ontap/section_cinder-conf-iscsi.html
netapp_lun_space_reservation Optional enabled This option specifies whether space will be reserved when creating Cinder volumes on NetApp backends using the iSCSI or FC storage protocols. If this option is set to enabled, LUNs created during volume creation or volume cloning workflows will always be thick provisioned. If this option is set to disabled, LUNs created during volume creation or volume cloning workflows will always be thin provisioned. Note that this option does not affect the implementation of Cinder snapshots, where the LUN clone that represents the snapshot will always be thin provisioned. Valid options are enabled and disabled.

I think this is a major gap in scheduling and it is causing failures in the creation of most of the cinder volumes with the following error,
2020-01-08 15:50:53.429 55 DEBUG cinder.volume.drivers.netapp.dataontap.client.api [req-6b39d422-b145-4f0e-a481-3216acd33588 e9d7c38c72744dfc840b40a1c8b8278f 4802b8cf975f447db9cc1cf4e30a25d2 - default default] ==> send_http_request: call {'self': <cinder.volume.drivers.netapp.dataontap.client.api.NaServer object at 0x7f9add25bad0>, 'na_element': <lun-create-by-size>
  <ostype>linux</ostype>
  <path>/vol/wbsdc_OSP_Prod_vol005/volume-03ab46f4-8461-49b6-9cfd-88098e9df34a</path>
  <space-reservation-enabled>true</space-reservation-enabled>
  <use-exact-size>true</use-exact-size>
  <size>214748364800</size>
</lun-create-by-size>
, 'enable_tunneling': True} trace_logging_wrapper /usr/lib/python2.7/site-packages/cinder/utils.py:914
2020-01-08 15:50:53.462 55 DEBUG cinder.volume.drivers.netapp.dataontap.client.api [req-6b39d422-b145-4f0e-a481-3216acd33588 e9d7c38c72744dfc840b40a1c8b8278f 4802b8cf975f447db9cc1cf4e30a25d2 - default default] <== send_http_request: return (32ms) <results xmlns="http://www.netapp.com/filer/admin" reason="No space left on device" status="failed" errno="28"/>

Tags: netapp
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.