When testing hibernation / resume on AWS with 5.0 or 5.3 kernels on bionic (using acpid 1:2.0.28-1ubuntu1), we sometimes see failure with repeated attempts. The first attempt will always be triggered, but the next attempt may not. The result is the agent never triggers the hibernation process and the instance will be forced to shutdown after a timeout period.
Two workarounds have been identified. The first is to restart acpid during the resume handler. The second is to use the latest upstream acpid (as of Feb 1, 2020). This second workaround indicates there may be a patch missing in the acpid in bionic (1:2.0.28-1ubuntu1) to work with the 5.0+ kernels.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel on a bionic image with on-demand hibernation support enabled.
2) Hibernate and resume the instance, ensuring the system is fully resumed afterward and the swap file has been removed.
3) Hibernate and resume another time. The hibernate should be triggered immediately and the instance should become unresponsive as it saves state to disk.
4) Resume the instance, it should come back with the same processes running.
5) Repeat 3) - 4) as necessary.
When testing hibernation / resume on AWS with 5.0 or 5.3 kernels on bionic (using acpid 1:2.0.28-1ubuntu1), we sometimes see failure with repeated attempts. The first attempt will always be triggered, but the next attempt may not. The result is the agent never triggers the hibernation process and the instance will be forced to shutdown after a timeout period.
Two workarounds have been identified. The first is to restart acpid during the resume handler. The second is to use the latest upstream acpid (as of Feb 1, 2020). This second workaround indicates there may be a patch missing in the acpid in bionic (1:2.0.28-1ubuntu1) to work with the 5.0+ kernels.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel on a bionic image with on-demand hibernation support enabled.
2) Hibernate and resume the instance, ensuring the system is fully resumed afterward and the swap file has been removed.
3) Hibernate and resume another time. The hibernate should be triggered immediately and the instance should become unresponsive as it saves state to disk.
4) Resume the instance, it should come back with the same processes running.
5) Repeat 3) - 4) as necessary.