Unable to defer "storage_detaching" or tear-down related events

Bug #2008112 reported by Mehdi B.
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

----
*Context:*

For databases, a scale-down event is an event that must be handled with care in order to avoid any risk of data loss.

Sometimes, it is necessary to be able to postpone the removal of a unit. When it is not safe to do so:
- at a particular moment (i.e the primary copies of the data may not be able to be relocated anywhere in the cluster)
- in a particular fashion (i.e removing the majority of the nodes at once, or simply removing multiple units at once as opposed to in a rolling manner.)

It becomes necessary to be able to defer a teardown of a unit until it can be handled gracefully. And keeping the unit very well functioning as if no termination event happened before.

----
*Current:*

It is currently not possible to defer the termination process of a unit naturally (event.defer()).

The only way (hack) to do so is by putting the unit in an error state in "storage_detaching", so that the unit does not unmount the storage and keeps retrying this hook until some satisfactory state.

This comes at a price:
- this unit does NOT receive subsequent events, it only keeps retrying the failed termination event. Effectively making this unit "diminished" compared to the rest of the nodes.

This has broader impacts, such as:
- Assuming the leader unit is the one that received the termination event, when this unit errors the "storage_detaching" event, this effectively prevents the "leader reelection" process to happen.
Which causes the fact that all hooks with a processing specifically assigned to the leader_unit will not trigger and the cluster will eventually be in an unexpected and unstable state.

----

*Environment:*

- Juju: 2.9.38.1

----

Thank you

Revision history for this message
Juan M. Tirado (tiradojm) wrote :

There is an ongoing discussion about this issue. No decisions have been made so far about how to approach this problem.

Changed in juju:
importance: Undecided → Medium
status: New → Triaged
tags: added: hooks status storage
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.