We can't address the issue "do not mark all nodes in error state" right now, since it's our limitation. I mean, in post deployment stage we have tasks which are critical for clusters (such as enable_quorum) as well as not critical (upload cirros or update host).
So if post deployment task has been failed, we mark entire deployment in error state, because we can't say whether cluster is operational or not. I think we can go with @Maciej's fix for now, and take in mind for general solution that should be addressed as a blueprint.
@Maciej,
Just come to mind, what do you think if we also mark **offline** nodes in **error**, so user will notice that updates wasn't applied there? It's ugly, but will notify a cluster operator that redeployment is needed for these nodes.
@Andrew,
We can't address the issue "do not mark all nodes in error state" right now, since it's our limitation. I mean, in post deployment stage we have tasks which are critical for clusters (such as enable_quorum) as well as not critical (upload cirros or update host).
So if post deployment task has been failed, we mark entire deployment in error state, because we can't say whether cluster is operational or not. I think we can go with @Maciej's fix for now, and take in mind for general solution that should be addressed as a blueprint.
@Maciej,
Just come to mind, what do you think if we also mark **offline** nodes in **error**, so user will notice that updates wasn't applied there? It's ugly, but will notify a cluster operator that redeployment is needed for these nodes.