xen:balloon errors in 14.04 beta

Bug #1304001 reported by Ben Howard
146
This bug affects 22 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Tim Gardner
Precise
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Undecided
Stefan Bader
Utopic
Fix Released
High
Stefan Bader

Bug Description

SRU Justification:
[Impact]
The following errors may occur on HVM instances on EC2: xen:balloon: reserve_additional_memory: add_memory() failed: -17

[Test Case]
Boot Ubuntu Trusty 3.13 series HVM instances and check dmesg for this error message.

[Fix]
A minimal fix for this bug can be found here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3dcf63677d4eb7fdfc13290c8558c301d2588fe8
This allows the ballooning to be cancelled if adding new memory failed which means that the error message is printed once instead of repeatedly.

There should still be effort to root cause this issue and determine how to avoid the ballooning errors in the first place. I still think this patch should be applied to alleviate symptoms until root cause is discovered.

--

Xen balloon errors on HVM instances on EC2 (Xen 4.2.amazon):

ubuntu@ip-10-63-20-99:~$ uname -a
Linux ip-10-63-20-99 3.13.0-23-generic #45-Ubuntu SMP Fri Apr 4 06:58:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ip-10-63-20-99:~$ dmesg |grep xen
[ 0.000000] xen:events: Xen HVM callback vector for event delivery is enabled
[ 0.494613] xen:balloon: Initialising balloon driver
[ 0.496046] xen_balloon: Initialising balloon driver
[ 0.500077] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 0.541047] Switched to clocksource xen
[ 0.562579] xen: --> pirq=16 -> irq=8 (gsi=8)
[ 0.562622] xen: --> pirq=17 -> irq=12 (gsi=12)
[ 0.562649] xen: --> pirq=18 -> irq=1 (gsi=1)
[ 0.562673] xen: --> pirq=19 -> irq=6 (gsi=6)
[ 0.562705] xen: --> pirq=20 -> irq=4 (gsi=4)
[ 0.920527] xen: --> pirq=21 -> irq=47 (gsi=47)
[ 0.920596] xen:grant_table: Grant tables using version 1 layout
[ 1.029661] xen_netfront: Initialising Xen virtual ethernet driver
[ 1.236083] xenbus_probe_frontend: Device with no driver: device/vfb/0
[ 2.516067] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 6.533941] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 14.560075] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 30.592064] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 62.688153] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 94.752164] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 126.816161] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 158.880084] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 190.944069] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 223.008141] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 255.072112] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 287.136190] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 319.200053] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 351.264164] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 383.328080] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 415.392077] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 447.456112] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 479.520128] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 511.584110] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 543.648181] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 575.712070] xen:balloon: reserve_additional_memory: add_memory() failed: -17
[ 607.776178] xen:balloon: reserve_additional_memory: add_memory() failed: -17

Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of the introduction of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key trusty
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1304001

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Andrew Lau (alau) wrote :

Munehisa Kamata (https://launchpad.net/~kamatam) took at a look at this from our side and he believes that

"the following commit has introduced this. It is in Linux kernel 3.13 or later, and I've confirmed that kernel 3.14 still shows the same symptom.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/xen/balloon.c?id=c275a57f5ec3056f732843b11659d892235faff7

Looking at the change, only HVM will exhibit this. Also, it seems that this can only happen on c3.large and m3.medium, they have < 4GB of memory. The following discussion may be related to this, but I'm not sure if it is.

http://lists.xen.org/archives/html/xen-devel/2014-01/msg01524.html"

Revision history for this message
Andrew Lau (alau) wrote :

I did my own testing with ubuntu-trusty-14.04-beta2-amd64-server-20140326 (ami-af8d9ac6) and
was able to reproduce this with both linux-image-3.13.0-19-generic (bundled) & linux-image-3.13.0-21-generic (latest)

This bug was bought to our attention when a customer was testing out 3.13.8-031308-generic from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.8-trusty/ on top of a Ubuntu 12.04 c3.large HVM instance using
ami-dfa98cb6

Revision history for this message
Andrew Lau (alau) wrote : apport information

AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 10 00:29 seq
 crw-rw---- 1 root audio 116, 33 Apr 10 00:29 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
Ec2AMI: ami-743f5744
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2b
Ec2InstanceType: c3.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-23-generic root=UUID=ec3b35cf-1650-47ca-b169-53df22183b61 ro console=tty1 console=ttyS0
ProcVersionSignature: User Name 3.13.0-23.45-generic 3.13.8
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-23-generic N/A
 linux-backports-modules-3.13.0-23-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty ec2-images
Uname: Linux 3.13.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
_MarkForUpload: True
dmi.bios.date: 01/24/2014
dmi.bios.vendor: Xen
dmi.bios.version: 4.2.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd01/24/2014:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.2.amazon
dmi.sys.vendor: Xen

tags: added: apport-collected ec2-images
Revision history for this message
Andrew Lau (alau) wrote : BootDmesg.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : Lspci.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : ProcModules.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : UdevDb.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : UdevLog.txt

apport information

Revision history for this message
Andrew Lau (alau) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Have you tried a test kernel with commit c275a57f5ec3056f732843b11659d892235faff7 reverted?

Also, do you happen to know the last kernel version that did not exhibit this bug? We can perform a kernel bisect if we can identify the last good kernel version and first bad kernel.

Changed in linux (Ubuntu):
importance: Medium → High
tags: added: kernel-key
removed: kernel-da-key
Revision history for this message
Andrew Lau (alau) wrote :

Hi Joseph,

I just tested https://launchpad.net/ubuntu/+source/linux/3.13.0-24.46
with c275a57f5ec3056f732843b11659d892235faff7 reverted and I can confirm that
"xen:balloon: reserve_additional_memory: add_memory() failed: -17"
is no longer appearing in my boot logs on either our c3.large or c3.8xlarge instances.

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

Can you attach Xen config file and xen boot options?

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

And guest kernel config file please.

Revision history for this message
Stefan Bader (smb) wrote :

I also would rather like to find a proper fix than just revert that commit. Andrew, can you provide the info Boris was asking for? And just in general, is there something else that looks or goes wrong in the guest. The error looks to be originating from add_memory and EEXIST may be returned if register_memory_resource fails (though should do a pr_debug) or if it is not a new node (with another additional printk). So it sounds likely it it is the former (register_memory_resource) and adding a
  dyndbg="file xen-balloon.c +p"
to the guests grub command line would activate the additional pr_debug lines.

Revision history for this message
Andrew Lau (alau) wrote :

Apologies for the hold up everyone. I'm trying to get somebody within AWS management to sign off on what we can and can't disclose regarding this.

Revision history for this message
Anthony Liguori (anthony-codemonkey) wrote :

Boris, this is EC2. Xen 4.2 with a default config.

Revision history for this message
Stefan Bader (smb) wrote :

I would think Boris is after memory related special settings on the host and guest (if you can reveal those). Could be you are hinting that at least for the host the config was not modified. I assume still using the xm/xend stack. The other info there would be the number assigned to mem of guest type like the one we have the dmesg of.
Boris, do you think it would help you to get the dyndbg enabled output?

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

I would like to reproduce this first and for that I will need at least config file that is used for launching the guest ('memory' and 'maxmem' for sure but having the whole file would be better) as well as guest's kernel config file. Knowing what xen boot options are as well as if there is anything "interesting" in /etc/xen/{xl.conf|xend-config.sxp} may also be helpful.

But seeing debug output definitely won't hurt. In fact, add 'debug loglevel=8 memblock=debug' to boot line for good measure (I hope it won't be too much output).

Revision history for this message
Andrew Lau (alau) wrote :

Hi Stefan, Boris,

I booted my EC2 guest with the following kernel options:

kernel /boot/vmlinuz-3.13.0-24-generic root=LABEL=cloudimg-rootfs ro console=hvc0 dyndbg="file xen-balloon.c +p" debug loglevel=8 memblock=debug

as requested, but I'm not seeing any additional debug messages accompanying
"xen:balloon: reserve_additional_memory: add_memory() failed: -17"
in any files under /var/log

I'm not much of a kernel hacker, so what else do I need to do to collate the logs or dumps that you guys need?

Revision history for this message
Andrew Lau (alau) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Hi Andrew,

sorry about the dyndbg. It all lokked like it would be tied to that but right now I am not sure it is. I need to check the code again. If anything else fails we can do a special kernel. I have the same issue as Boris in that I am not able to cause the same problem on HVM guests locally. I am using an Ubuntu base install (Xen-4.2.2 + dom kernel 3.8). xend-config is set to allow ballooning.
Now the question is how memory for a guest is configured on the host. I can see the issue on a c3.large instance but we have no way of knowing anything outside the guest.

One note, too. There is one other memory related message in the guests dmesg:
[ 0.000000] NUMA: Warning: invalid memblk node 0 [mem 0x100000000-0x0fffffff]

I vaguely remember NUMA related problems at some point. But I cannot remember whether this was something that needed fixes in the hypervisor, the dom0 kernel or the guest kernel. Or combinations of that.

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

This warning implies that SRAT may be wrong. And I just noticed that this may not be vanilla 4.2:
 [ 0.000000] DMI: Xen HVM domU, BIOS 4.2.amazon 01/24/2014

Are we on a custom HVM BIOS? That's where memory is described to domain.

Although NUMA issue may be somewhat orthogonal, it still may be worh looking at. Andrew, can you do this on booted guest:

   acpidump > acpi.dat
   acpixtract -a acpi.dat
   iasl -d srat.dat

and post here srat.dsl

And again, seeing guest config file would be important.

Revision history for this message
Andrew Lau (alau) wrote :
Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

[14Ah 0330 4] Proximity Domain : 00000000
[14Eh 0334 2] Reserved1 : 0000
[150h 0336 8] Base Address : 0000000100000000
[158h 0344 8] Address Length : FFFFFFFF10000000 <======= This looks wrong
[160h 0352 4] Reserved2 : 00000000
[164h 0356 4] Flags (decoded below) : 00000001
                                     Enabled : 1
                               Hot Pluggable : 0
                                Non-Volatile : 0
[168h 0360 8] Reserved3 : 0000000000000000

So we know what caused the warning.

Revision history for this message
Andrew Lau (alau) wrote :

Boris, do you still need our Xen configurations to proceed further then?

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

Yes, those would be helpful although now I think understanding EC's BIOS (seabios I assume?) may be more important. Neither I nor Stefan have been able to reproduce this problem and it seems that EC does something in BIOS (and possibly qemu) that may not be available in vanilla Xen, most importantly building ACPI tables for guests (which is where memory is desribed). I, for example, would like to know how SRAT is generated. I don't think this can be done by config file alone.

We may need to get Amazon folks involved.

tags: added: kernel-da-key
removed: kernel-key
Revision history for this message
James Huxtable (james-huxtable) wrote :

Hi,

Thought I'd mention that I'm having this problem too. The kernel logs are flooded with this error:
ubuntu kernel: [129626.648070] xen:balloon: reserve_additional_memory: add_memory() failed: -17

I'm running XEN on a dedicated i5 PC with Jaunty with up to date packages.
I haven't installed any packages which aren't included with Ubuntu. Seabios, QEMU and Xen are all default versions.

Linux ubuntu 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Let me know if you need any info?

Revision history for this message
James Huxtable (james-huxtable) wrote :

Sorry I meant trusty not jaunty.

Distributor ID: Ubuntu
Description: Ubuntu 14.04 LTS
Release: 14.04
Codename: trusty

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote : Re: [Bug 1304001] Re: xen:balloon errors in 14.04 beta

Can you please post:

1. Guest configuration file
2. boot log from dom0 (please boot loglevel=8)
3. xl dmesg and xl info from dom0
4. boot log from guest (also loglevel=8)
5. dmidecode from guest (I am mostly interested in BIOS object)

-boris

Revision history for this message
James Huxtable (james-huxtable) wrote :

I did some further testing. Running Ubuntu LTS 14.04 as a domU doesn't seem to create the problem. The specific VM which is causing the error messages is running Linux 3.2.40. So I guess the bug could be in the guest's kernel. I can't really upgrade it as it's using a customised version that's required for the software it's running.

Anyway, I'll attach the logs as requested.

Revision history for this message
James Huxtable (james-huxtable) wrote :

xl_dmesg.txt

Revision history for this message
James Huxtable (james-huxtable) wrote :

xl_info.txt

Revision history for this message
James Huxtable (james-huxtable) wrote :

dmesg_dom0.txt

Revision history for this message
James Huxtable (james-huxtable) wrote :

domu.cfg.txt

Revision history for this message
James Huxtable (james-huxtable) wrote :

dmesg_domu.txt

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.13.0-24.46
Andrew Lau (alau)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: patch
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Tim Gardner (timg-tpi)
status: Confirmed → In Progress
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
assignee: nobody → Stefan Bader (smb)
status: New → In Progress
24 comments hidden view all 104 comments
Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

I am not sure the bug should be closed. David's fix made system behavior tolerable to users (because the error is now only reported once) but the problem is still there. If possible I'd suggest lowering bug's priority but keeoping it open.

Chris J Arges (arges)
description: updated
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Cherry picked 3dcf63677d4eb7fdfc13290c8558c301d2588fe8 for Trusty which will mark this bug as fix released as soon as the kernel is promoted to -updates.

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Galland (victor-lopez) wrote :

I had this problem on a PC with Xen and Linux VMs. The VMs are configured to boot with just 400 MB of RAM allotted to them and up to 8GB max RAM.
I was never able to get above some 4.3GB in any VM, and log showed the "reserve_additional_memory: add_memory() failed: -17" message at that point

The SOLUTION that fixes it was proposed at qubes-devel mailing list (Qubes is a Xen+Linux distro for ultra-secure PCs):
     https://groups.google.com/d/msg/qubes-devel/VRqkFj1IOtA/UgMgnwfxVSIJ

As proposed there: if I set initial RAM to 800MB and max ram scales above 4.3GB (actually tested to up to 11GB), without any "add_memory() failed" messages.

The explanation given is that the VM's kernel allocates its memory related structures at boot time and the memory available at that time limits the max memory it may scale up to. When that happens, it starts throwing the balloon error messages.

Revision history for this message
Anders Hall (a.hall) wrote :

Syslog shows:

kernel: [ 832.224074] xen:balloon: reserve_additional_memory: add_memory() failed: -17

On latest AMI as of today (https://cloud-images.ubuntu.com/locator/ec2/):

eu-west-1 trusty 14.04 LTS amd64 hvm:ebs 20140829 ami-acc41cdb hvm

Running on instance type (https://aws.amazon.com/ec2/instance-types):

c3.large 2 7 3.75 2 x 16 SSD

Revision history for this message
Stefan Bader (smb) wrote :

If it is only exactly one message like that, this would be what we expect from the quick aid. The hint about scaling up certain factors could be something to keep in mind. Might not be exactly the same as it seems to happen on guests with less than 4G, but still. Maybe I can take a bit of time and try to step through the various memory related dmesg entries we can see in the posted logs to figure whether and how things possibly change. I cannot remember that we heard any detail on that in AWS. Right now a black box to me in that respect.

Revision history for this message
Anders Hall (a.hall) wrote :

Some additional information. Even with the syslog message i ran 55 GB utilization all night and all processes finished with success on AWS c3.large. The message also appears to be missing after reboot on c3.large (~60 GiB).

On a c3.large (3.75 GiB) the message is spamming syslog (many messages) even after reboot. However, the server seems to be working fine.

Revision history for this message
Anders Hall (a.hall) wrote :

Sorry for incorrect information. The message is gone from r3.2xlarge (~60 GiB) and present on c3.large (3.75 GiB).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.16.0-16.22

---------------
linux (3.16.0-16.22) utopic; urgency=low

  [ Andy Whitcroft ]

  * Revert "SAUCE: x86/xen: Fix setup of 64bit kernel pagetables"
  * [Config] tools -- only build common tools when enabled
  * [Config] follow rename of DEB_BUILD_PROFILES

  [ Tim Gardner ]

  * [Debian] set do_*_tools after stage1 or bootstrap is determined
    - LP: #1370211
  * Release Tracking Bug
    - LP: #1370535

  [ Upstream Kernel Changes ]

  * x86/xen: don't copy bogus duplicate entries into kernel page tables
  * blk-merge: fix blk_recount_segments
    - LP: #1359146
  * igb: bring link up when PHY is powered up
    - LP: #1370018
  * igb: remove unnecessary break after goto
    - LP: #1370018
  * igb: remove unnecessary break after return
    - LP: #1370018
  * igb: Add message when malformed packets detected by hw
    - LP: #1370018
  * igb: bump igb version to 5.2.13
    - LP: #1370018
 -- Tim Gardner <email address hidden> Tue, 16 Sep 2014 10:19:04 -0600

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

I think that this is a verification-failed. Upgrading the kernel in -proposed does not fix the issue:

ubuntu@ip-10-11-165-53:~$ uname -a
Linux ip-10-11-165-53 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@ip-10-11-165-53:~$ dpkg-query --show linux-image-virtual
linux-image-virtual 3.13.0.37.44

ubuntu@ip-10-11-165-53:~$ dmesg | grep xen
[ 0.000000] xen: PV spinlocks enabled
[ 0.000000] xen:events: Xen HVM callback vector for event delivery is enabled
[ 0.440123] xen:balloon: Initialising balloon driver
[ 0.444026] xen_balloon: Initialising balloon driver
[ 0.452023] xen:balloon: Cannot add additional memory (-17)
[ 0.492062] Switched to clocksource xen
[ 0.515406] xen: --> pirq=16 -> irq=8 (gsi=8)
[ 0.515446] xen: --> pirq=17 -> irq=12 (gsi=12)
[ 0.515470] xen: --> pirq=18 -> irq=1 (gsi=1)
[ 0.515497] xen: --> pirq=19 -> irq=6 (gsi=6)
[ 0.515530] xen: --> pirq=20 -> irq=4 (gsi=4)
[ 0.778201] xen: --> pirq=21 -> irq=28 (gsi=28)
[ 0.778253] xen:grant_table: Grant tables using version 1 layout
[ 0.917639] xen_netfront: Initialising Xen virtual ethernet driver
[ 1.006714] xenbus_probe_frontend: Device with no driver: device/vfb/0
[ 2.456083] xen:balloon: Cannot add additional memory (-17)
[ 6.460080] xen:balloon: Cannot add additional memory (-17)
[ 14.476110] xen:balloon: Cannot add additional memory (-17)
[ 30.492059] xen:balloon: Cannot add additional memory (-17)
[ 62.556156] xen:balloon: Cannot add additional memory (-17)
[ 94.620107] xen:balloon: Cannot add additional memory (-17)

tags: added: verification-failed
removed: verification-needed-trusty
Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

Hmm, yes, the patch may not work if something (AWS, I guess) keeps requesting to balloon memory.

Can someone try the attached patch (on top of David's)? I only compile-tested it.

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

Actually, this (v2) is a slightly better one to try.

Brad Figg (brad-figg)
tags: added: verification-failed-trusty
Mathew Hodson (mhodson)
tags: removed: kernel-request-3.13.0-24.46
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (22.1 KiB)

This bug was fixed in the package linux - 3.13.0-37.64

---------------
linux (3.13.0-37.64) trusty; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1372576

  [ dann frazier ]

  * [Config] CONFIG_HW_RANDOM_XGENE=m on arm64

  [ Edward Lin ]

  * SAUCE: Add use_native_backlight quirk for Dell Inspiron 5721/3521
    - LP: #1354253, #1354313

  [ Tim Gardner ]

  * SAUCE: Fix nfs oops stable regression
    - LP: #1348670
  * [Config] Add mpt3sas to d-i
    - LP: #1368907
  * [Config] CONFIG_X86_16BIT=y
    - LP: #1371601

  [ Timo Aaltonen ]

  * SAUCE: i915_bdw: Rebase to v3.15.8
    - LP: #1359213

  [ Upstream Kernel Changes ]

  * Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime
    option"
    - LP: #1371601
  * mmc: rtsx: add R1-no-CRC mmc command type handle
    - LP: #1365378
  * rpc_pipe: remove the clntXX dir if creating the pipe fails
    - LP: #1365869
  * sunrpc: add an "info" file for the dummy gssd pipe
    - LP: #1365869
  * rpc_pipe: fix cleanup of dummy gssd directory when notification fails
    - LP: #1365869
  * hwrng: xgene - add support for APM X-Gene SoC RNG support
    - LP: #1365593
  * Documentation: rng: Add X-Gene SoC RNG driver documentation
    - LP: #1365593
  * arm64: dts: add random number generator dts node to APM X-Gene
    platform.
    - LP: #1365593
  * xen/balloon: cancel ballooning if adding new memory failed
    - LP: #1304001
  * x86/xen: resume timer irqs early
    - LP: #1368724
  * xen/manage: Always freeze/thaw processes when suspend/resuming
    - LP: #1368724
  * scsi_transport_sas: move bsg destructor into sas_rphy_remove
    - LP: #1368991
  * drm/i915: Enable 5.4Ghz (HBR2) link rate for Displayport 1.2-capable
    devices
    - LP: #1369633
  * bnx2x: Fix link for KR with swapped polarity lane
    - LP: #1370716
  * drm: add DRM_CAPs for cursor size
    - LP: #1359213
  * drm/dp: Add AUX channel infrastructure
    - LP: #1359213
  * drm/dp: Add drm_dp_dpcd_read_link_status()
    - LP: #1359213
  * drm/dp: Add DisplayPort link helpers
    - LP: #1359213
  * drm/dp: Allow registering AUX channels as I2C busses
    - LP: #1359213
  * drm/dp: let drivers specify the name of the I2C-over-AUX adapter
    - LP: #1359213
  * drm/dp: make aux retries less chatty
    - LP: #1359213
  * Bluetooth: Enable Atheros 0cf3:311e for firmware upload
    - LP: #1371477
  * bnx2x: fix crash during TSO tunneling
    - LP: #1371601
  * inetpeer: get rid of ip_id_count
    - LP: #1371601
  * ip: make IP identifiers less predictable
    - LP: #1371601
  * tcp: Fix integer-overflows in TCP veno
    - LP: #1371601
  * tcp: Fix integer-overflow in TCP vegas
    - LP: #1371601
  * macvlan: Initialize vlan_features to turn on offload support.
    - LP: #1371601
  * net: Correctly set segment mac_len in skb_segment().
    - LP: #1371601
  * iovec: make sure the caller actually wants anything in
    memcpy_fromiovecend
    - LP: #1371601
  * batman-adv: Fix out-of-order fragmentation support
    - LP: #1371601
  * sctp: fix possible seqlock seadlock in sctp_packet_transmit()
    - LP: #1371601
  * sparc64: Fix argument sign extension for compat_sys_futex().
    - LP: #1371601
  ...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Dave Chiluk (chiluk)
tags: added: cts
1 comments hidden view all 104 comments
Revision history for this message
Martijn Heemels (yggdrasil) wrote :

It doesn't appear fixed to me. Please reopen?

I just installed an m3.medium instance, updated to kernel 3.13.0-37.64 (which should contain the fix) and rebooted.

The errors appear as soon as booting is nearly complete.

From dmesg:
[ 6.631751] xen:balloon: Cannot add additional memory (-17)
[ 14.661002] xen:balloon: Cannot add additional memory (-17)
[ 30.688074] xen:balloon: Cannot add additional memory (-17)
etc...

From syslog:
2014-10-22T13:09:30.785177+00:00 www12 kernel: [ 1345.312075] xen:balloon: Cannot add additional memory (-17)
2014-10-22T13:10:02.849179+00:00 www12 kernel: [ 1377.376076] xen:balloon: Cannot add additional memory (-17)
2014-10-22T13:10:34.913176+00:00 www12 kernel: [ 1409.440077] xen:balloon: Cannot add additional memory (-17)

$ uname -a
Linux www12 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Stefan Bader (smb) wrote :

This looks like to be requiring even more. I found the following patch which supposedly was queued for 3.18 but I cannot find it in upstream git, yet: "xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered".

Commit 3dcf63677d4e ("xen/balloon: cancel ballooning if adding new
memory failed") makes reserve_additional_memory() return BP_ECANCELED
when an error is encountered. This error, however, is ignored by the
caller (balloon_process()) since it is overwritten by subsequent call
to update_schedule(). This results in continuous attempts to add more
memory, all of which are likely to fail again.

We should stop trying to schedule next iteration of ballooning when
the current one has failed.

Changed in linux (Ubuntu Utopic):
status: Fix Released → Triaged
Changed in linux (Ubuntu Trusty):
status: Fix Released → Triaged
Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

I was waiting for this patch to be pulled into the mainline before posting a pointer/commitID here. It's not there yet.

But yes, you need that commit (on top of 3dcf63677d4e) to make kernel shut up.

Revision history for this message
Boris Ostrovsky (boris-ostrovsky) wrote :

It is now in mainline, commit fd8b79511349.

Revision history for this message
Stefan Bader (smb) wrote :

Updated test kernels at http://people.canonical.com/~smb/lp1304001/ (again for normal installs based on cloud-images the linux-image-extra is not needed).

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Trusty):
status: Triaged → Fix Committed
Changed in linux (Ubuntu Utopic):
assignee: Tim Gardner (timg-tpi) → Stefan Bader (smb)
status: Triaged → Fix Committed
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I have created one PPA located at :

https://launchpad.net/~inaddy/+archive/ubuntu/lp1304001

Containing both fixes to address this issue:

commit 3dcf63677d4eb7fdfc13290c8558c301d2588fe8 (already present in Trusty)
Author: David Vrabel <email address hidden>
Date: Mon Sep 1 18:52:44 2014 +0100
xen/balloon: cancel ballooning if adding new memory failed

commit fd8b79511349efd1f0decea920f61b93acb34a75 (cherry-picked from upstream)
Author: Boris Ostrovsky <email address hidden>
Date: Tue Oct 7 17:00:07 2014 -0400
xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered

So users can test the fix and provide us feedback.

Thank you

Rafael Tinoco

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

FYI,

I've received information that this indeed fixes the described problem.

Thank you Stefan, Tim, Boris.

Revision history for this message
Martijn Heemels (yggdrasil) wrote :

I've also tested with the kernel from Rafael's PPA, and can confirm that the alerts are gone for me.

Thanks to everyone involved!

Revision history for this message
Andrew Lau (alau) wrote :

Is there a time frame for when the 2nd fix will make its way out to trusty/utopic? Thanks.

Revision history for this message
Stefan Bader (smb) wrote :

It looks like the patches missed the current cycle. But that might move to updates this week (maybe today). They are queued for the next round. So maybe around next week status here should indicate it wen into proposed.

Revision history for this message
Mathew Hodson (mhodson) wrote :

linux-keystone in trusty-proposed is available at https://launchpad.net/ubuntu/+source/linux-keystone/3.13.0-18.28

tags: added: verification-needed-trusty
removed: verification-failed verification-failed-trusty
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Revision history for this message
Scott Emmons (lscotte) wrote :

Will this patch go out in 3.13.0-38 this week?

I'm assuming that the discussion around 'linux-keystone' is something different. Regardless, please do not drop this fix from 3.13.0-38 as we're expecting it this week per Stefan's comments above - and we have already verified it resolves the issue.

Revision history for this message
Scott Emmons (lscotte) wrote :

I tested with the latest 3.13 kernel in trusty-proposed:

  Version table:
     3.13.0.40.47 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-proposed/main amd64 Packages

I can confirm that the "xen:balloon: Cannot add additional memory (-17)" message do not occur with this kernel in AWS with an m3 class instance and HVM virtualization.

This looks good to me - fix verified.

Stefan Bader (smb)
tags: added: verification-done-trusty
removed: verification-needed-trusty
1 comments hidden view all 104 comments
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello

Is possible for anyone to test Utopic kernel so we can flag this as "verification-done" and push this to be released asap ?

Thank you very much

Rafael Tinoco

Revision history for this message
Martijn Heemels (yggdrasil) wrote : Re: [Bug 1304001] Re: xen:balloon errors in 14.04 beta

Looking good. I'm no expert on testing pre-release packages so please
correct me if I've done something wrong.

Tested on Utopic today (ami-540ba523 on a m3.medium amd64 hvm). It came
with kernel 3.16.0-24.32 which exhibits the xen:balloon error messages.

Upgraded to 3.16.0.24.25, the latest available in the regular repos.
Reboot. Still the same error messages.

Enabled the utopic-proposed repo and installed kernel 3.16.0.25.26. Reboot.
The messages no longer appear!

Seems good to me!

Regards, Martijn Heemels

On Fri, Nov 14, 2014 at 1:53 PM, Rafael David Tinoco <
<email address hidden>> wrote:

> Hello
>
> Is possible for anyone to test Utopic kernel so we can flag this as
> "verification-done" and push this to be released asap ?
>
> Thank you very much
>
> Rafael Tinoco
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1304001
>
> Title:
> xen:balloon errors in 14.04 beta
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304001/+subscriptions
>

Revision history for this message
Luis Henriques (henrix) wrote :

Awesome, thanks a lot Martijn. I'm tagging this bug as verified.

tags: added: verification-done-utopic
removed: verification-needed-utopic
tags: added: verification-done
removed: verification-done-trusty verification-done-utopic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (22.1 KiB)

This bug was fixed in the package linux - 3.13.0-40.69

---------------
linux (3.13.0-40.69) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - re-used previous tracking bug

  [ Upstream Kernel Changes ]

  * regmap: fix kernel hang on regmap_bulk_write with zero val_count.

linux (3.13.0-40.68) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1388943
  * SAUCE: DEP8 test to run our regression tests
    - LP: #1385330
  * SAUCE: The very first thing we should do when testing is make sure we
    are testing the correct kernel
    - LP: #1385330

  [ dann frazier ]

  * [Config] Disable CONFIG_IPMI_SI_PROBE_DEFAULTS on armhf and arm64
    - LP: #1388952

  [ Duc Dang ]

  * SAUCE: (no-up) [PCIE] APM X-Gene: Remove debug messages in MSI
    interrupt handler path.
    - LP: #1382244
  * SAUCE: (no-up) PCI: X-Gene: Fix max payload size and phantom function
    configuration
    - LP: #1386261

  [ McAulay, Alistair ]

  * SAUCE: drm/i915: Rework GPU reset sequence to match driver load & thaw
    - LP: #1384469

  [ Timo Aaltonen ]

  * SAUCE: i915_bdw: Fix cherry-pick typo
    - LP: #1384469

  [ Upstream Kernel Changes ]

  * Revert "mac80211: disable uAPSD if all ACs are under ACM"
    - LP: #1381234
  * Revert "iwlwifi: dvm: don't enable CTS to self"
    - LP: #1381234
  * Revert "lzo: properly check for overruns"
    - LP: #1387886
  * drm/i915: provide interface for audio driver to query cdclk
    - LP: #1381168
  * regulatory: add NUL to alpha2
    - LP: #1381234
  * percpu: fix pcpu_alloc_pages() failure path
    - LP: #1381234
  * percpu: perform tlb flush after pcpu_map_pages() failure
    - LP: #1381234
  * cgroup: reject cgroup names with '\n'
    - LP: #1381234
  * vfs: add d_is_dir()
    - LP: #1381234
  * CIFS: Fix directory rename error
    - LP: #1381234
  * usb: phy: twl4030-usb: Fix lost interrupts after ID pin goes down
    - LP: #1381234
  * rtlwifi: rtl8192cu: Add new ID
    - LP: #1381234
  * CIFS: Fix wrong restart readdir for SMB1
    - LP: #1381234
  * CIFS: Fix wrong filename length for SMB2
    - LP: #1381234
  * ahci: Add Device IDs for Intel 9 Series PCH
    - LP: #1381234
  * ata_piix: Add Device IDs for Intel 9 Series PCH
    - LP: #1381234
  * USB: zte_ev: fix removed PIDs
    - LP: #1381234
  * USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
    - LP: #1381234
  * USB: sierra: avoid CDC class functions on "68A3" devices
    - LP: #1381234
  * USB: sierra: add 1199:68AA device ID
    - LP: #1381234
  * iommu/arm-smmu: fix programming of SMMU_CBn_TCR for stage 1
    - LP: #1381234
  * iommu/arm-smmu: remove pgtable_page_{c,d}tor()
    - LP: #1381234
  * usb: gadget: fusb300_udc.h: Fix typo in include guard
    - LP: #1381234
  * usb: phy: tegra: Avoid use of sizeof(void)
    - LP: #1381234
  * arm64: use irq_set_affinity with force=false when migrating irqs
    - LP: #1381234
  * block: Fix dev_t minor allocation lifetime
    - LP: #1381234
  * usb: dwc3: core: fix order of PM runtime calls
    - LP: #1381234
  * usb: dwc3: core: fix ordering for PHY suspend
    - LP: #1381234
  * usb: dwc3: omap: fix ordering for runtime pm calls
    - LP: #1381234
  * ...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
1 comments hidden view all 104 comments
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.0 KiB)

This bug was fixed in the package linux - 3.16.0-25.33

---------------
linux (3.16.0-25.33) utopic; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1389170

  [ dann frazier ]

  * [Config] Disable CONFIG_IPMI_SI_PROBE_DEFAULTS on armhf and arm64
    - LP: #1388952

  [ Duc Dang ]

  * SAUCE: (no-up) [PCIE] APM X-Gene: Remove debug messages in MSI
    interrupt handler path.
    - LP: #1382244
  * SAUCE: (no-up) PCI: X-Gene: Fix max payload size and phantom function
    configuration
    - LP: #1386261

  [ Tim Gardner ]

  * Revert "SAUCE: (no-up) PCI: Increase BAR size quirk for IBM ipr SAS
    Crocodile adapters"
    - LP: #1387813
  * [Config] CONFIG_SOUND_OSS_CORE_PRECLAIM=n
    - LP: #1385510
  * [Debian] install usbipd
    - LP: #898003
  * [Debian] Fix linux-doc dangling symlinks
    - LP: #661306

  [ Upstream Kernel Changes ]

  * Revert "macvlan: simplify the structure port"
    - LP: #1381490
  * Revert "net/macb: add pinctrl consumer support"
    - LP: #1381490
  * Revert "lzo: properly check for overruns"
    - LP: #1387813
  * Revert "ath9k_hw: reduce ANI firstep range for older chips"
    - LP: #1387813
  * ASoC: ssm2602: do not hardcode type to SSM2602
    - LP: #1379785
  * ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error.
    - LP: #1379785
  * perf: fix perf bug in fork()
    - LP: #1379785
  * mm: memcontrol: do not iterate uninitialized memcgs
    - LP: #1379785
  * mm: migrate: Close race between migration completion and mprotect
    - LP: #1379785
  * i2c: qup: Fix order of runtime pm initialization
    - LP: #1379785
  * i2c: rk3x: fix 0 length write transfers
    - LP: #1379785
  * ACPI / i915: Update the condition to ignore firmware backlight change
    request
    - LP: #1379785
  * cpufreq: integrator: fix integrator_cpufreq_remove return type
    - LP: #1379785
  * cpufreq: pcc-cpufreq: Fix wait_event() under spinlock
    - LP: #1379785
  * md/raid5: disable 'DISCARD' by default due to safety concerns.
    - LP: #1379785
  * drm/i915: Flush the PTEs after updating them before suspend
    - LP: #1379785
  * Fix problem recognizing symlinks
    - LP: #1379785
  * init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
    - LP: #1379785
  * ring-buffer: Fix infinite spin in reading buffer
    - LP: #1379785
  * uas: Only complain about missing sg if all other checks succeed
    - LP: #1379785
  * uas: Log a warning when we cannot use uas because the hcd lacks streams
    - LP: #1379785
  * uas: Disable uas on ASM1051 devices
    - LP: #1379785
  * uas: Add missing le16_to_cpu calls to asm1051 / asm1053 usb-id check
    - LP: #1379785
  * x86, ia64: Move EFI_FB vga_default_device() initialization to
    pci_vga_fixup()
    - LP: #1379785
  * vgaarb: Don't default exclusively to first video device with mem+io
    - LP: #1379785
  * mm, thp: move invariant bug check out of loop in __split_huge_page_map
    - LP: #1379785
  * mm: numa: Do not mark PTEs pte_numa when splitting huge pages
    - LP: #1379785
  * media: vb2: fix VBI/poll regression
    - LP: #1379785
  * jiffies: Fix timeval conversion to jiffies
    - LP: #1379785
  * Linux 3.16.5
    - LP: #1379785
 ...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Mark Rose (markrose) wrote :

This bug still exists in Precise with 3.2.0-113.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Luis Henriques (henrix)
Changed in linux (Ubuntu Precise):
status: New → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-done-trusty verification-done-utopic verification-needed
removed: verification-done
tags: added: verification-needed-trusty
removed: verification-needed
tags: added: verification-needed-precise
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.2.0-116.158

---------------
linux (3.2.0-116.158) precise; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1640549

  * xen:balloon errors in 14.04 beta (LP: #1304001)
    - xen/balloon: cancel ballooning if adding new memory failed
    - xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered

  * CVE-2016-7425
    - scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()

 -- Luis Henriques <email address hidden> Wed, 09 Nov 2016 17:09:27 +0000

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote : Update Released

The verification of the Stable Release Update for linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Displaying first 40 and last 40 comments. View all 104 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.