Ubuntu
ec2-hibinit-agent package

[SRU] OOM errors with new kernels on resuming

Xenial (16.04)
Bug #1863242

Bug #1863242 reported by Balint Reczey on 2020-02-14

This bug affects 1 person

	Status	Importance	Assigned to
ec2-hibinit-agent (Ubuntu)	Fix Released	Undecided	Unassigned
Xenial	Incomplete	Undecided	Unassigned
Bionic	Fix Released	Undecided	Unassigned
Eoan	Fix Released	Undecided	Unassigned

Bug Description

[Impact]

* During resuming EC2 instances from hibernation sometimes processes are killed OOM manager.

[Test Case]

* Set up an EC2 instance to allow hibernation as the stop instance action.
* Start the attached Python script in a screen session to reserve 85% of the memory:
python3 mem-waster-pct.py -p 85

* Log out, hibernate, then resume the instance.
* Observe the Python script still running after resuming

[Regression Potential]

* The fix is setting memory overcommit policy to 'always overcommit' while removing the swap file. This helps dealing with the shrinking swap space during the swap removal. There is no expected side effect, since processes trying to allocate excessive amount of memory would fail with stricter policies, too.

The fix introduces a potential race condition with processes detecting the overcommit policy:

The policy used when the hibernation took place is saved shortly after resuming and it is restored after the swap file is removed. In this time window other processes detect the policy as 'always overcommit', despite it may not have been set as such before hibernation and may be restored to a different policy after removing the swap file. Hitting this race condition seems to be unlikely and there seem to be no good way of avoiding it.

See original description

Tags:

Balint Reczey (rbalint) on 2020-02-14

Changed in ec2-hibinit-agent (Ubuntu):
assignee:	nobody → Balint Reczey (rbalint)

Revision history for this message

Balint Reczey (rbalint) wrote on 2020-02-25:

@fginther Could you please add reproduction steps for the SRU process?

Balint Reczey (rbalint) on 2020-03-11

Changed in ec2-hibinit-agent (Ubuntu):
status:	New → Incomplete
assignee:	Balint Reczey (rbalint) → nobody

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-03-12:

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu8

---------------
ec2-hibinit-agent (1.0.0-0ubuntu8) focal; urgency=medium

  * debian/hibinit-resume: Add extra steps around swapoff to avoid OOM errors.
    Also work around xen-netfront not resuming properly.
    Thanks to Francis Ginther for the initial patch (LP: #1863242, #1864041)

-- Balint Reczey <email address hidden> Thu, 12 Mar 2020 14:05:06 +0100

Changed in ec2-hibinit-agent (Ubuntu):
status:	Incomplete → Fix Released

Francis Ginther (fginther) on 2020-03-13

tags:

added: id-5e459f823f8a2435d44842eb

Revision history for this message

Francis Ginther (fginther) wrote on 2020-03-16:

mem-waster-pct.py Edit (3.7 KiB, text/x-python)

I'm attaching the script I've been using to consume and hold memory, mem-waster-pct.py. It can be invoked with:

nohup /usr/bin/python3 /root/mem-waster-pct.py -p 85

This will cause the script to consume memory by appending data to a list until 85% of memory is consumed by all processes. It will then hold that memory until the process is killed.

Balint Reczey (rbalint) on 2020-03-23

description:

updated

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2020-03-23: Please test proposed package

Hello Balint, or anyone else affected,

Accepted ec2-hibinit-agent into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ec2-hibinit-agent/1.0.0-0ubuntu7.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ec2-hibinit-agent (Ubuntu Eoan):
status:	New → Fix Committed
tags:	added: verification-needed verification-needed-eoan
Changed in ec2-hibinit-agent (Ubuntu Bionic):
status:	New → Fix Committed
tags:	added: verification-needed-bionic

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2020-03-23:

Hello Balint, or anyone else affected,

Accepted ec2-hibinit-agent into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ec2-hibinit-agent/1.0.0-0ubuntu4~18.04.4 in a few hours, and then in the -proposed repository.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message

Francis Ginther (fginther) wrote on 2020-04-01:

I've completed bionic testing with 500+ runs and no issues. Setting "verification-done-bionic".

tags:

added: verification-done-bionic
removed: verification-needed-bionic

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-04-07:

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu4~18.04.4

---------------
ec2-hibinit-agent (1.0.0-0ubuntu4~18.04.4) bionic; urgency=medium

-- Balint Reczey <email address hidden> Mon, 23 Mar 2020 13:03:38 +0100

Changed in ec2-hibinit-agent (Ubuntu Bionic):
status:	Fix Committed → Fix Released

Revision history for this message

Brian Murray (brian-murray) wrote on 2020-04-07: Update Released

The verification of the Stable Release Update for ec2-hibinit-agent has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Francis Ginther (fginther) wrote on 2020-07-02:

I've done additional testing with eoan with the ec2-hibinit-agent with no observed OOM issues. Setting to `verification-done-eoan`.

tags:

added: verification-done-eoan
removed: verification-needed-eoan

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-07-07:

#10

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu7.1

---------------
ec2-hibinit-agent (1.0.0-0ubuntu7.1) eoan; urgency=medium

-- Balint Reczey <email address hidden> Mon, 23 Mar 2020 13:03:38 +0100

Changed in ec2-hibinit-agent (Ubuntu Eoan):
status:	Fix Committed → Fix Released

Revision history for this message

Balint Reczey (rbalint) wrote on 2020-10-20:

#11

@fginther Xenial does not need the backport, right?

Changed in ec2-hibinit-agent (Ubuntu Xenial):
status:	New → Incomplete

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

mem-waster-pct.py Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuec2-hibinit-agent package

[SRU] OOM errors with new kernels on resuming

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
ec2-hibinit-agent package