One of the issues I see with the proposed change is that it will cause problems when deploying additional OSDs. If you have a large Ceph cluster with lots of data in it and you want to add more OSDs later via Fuel it will fail with this change. Each time an OSD is added Ceph will rebalance the cluster. With lots of data this could easily take more than 30 minutes and Fuel will mark the deployment as failed even though the OSDs deployed (and are working) correctly. While Ceph is rebalancing the status will be HEATLH_WARN but the cluster is still usable.
I think moving the cluster health check to a post-deployment task is a better way to do it. Also, rather than checking the status we should write data to the cluster and read it back to verify that it's working correctly. If we want to verify that each individual OSD has deployed correctly we should verify that the OSD we just deployed is marked up and in by Ceph.
One of the issues I see with the proposed change is that it will cause problems when deploying additional OSDs. If you have a large Ceph cluster with lots of data in it and you want to add more OSDs later via Fuel it will fail with this change. Each time an OSD is added Ceph will rebalance the cluster. With lots of data this could easily take more than 30 minutes and Fuel will mark the deployment as failed even though the OSDs deployed (and are working) correctly. While Ceph is rebalancing the status will be HEATLH_WARN but the cluster is still usable.
I think moving the cluster health check to a post-deployment task is a better way to do it. Also, rather than checking the status we should write data to the cluster and read it back to verify that it's working correctly. If we want to verify that each individual OSD has deployed correctly we should verify that the OSD we just deployed is marked up and in by Ceph.