Well. After a long search, with lots of colourful herings, I zeroed in this: trying to start a new instance; instance would stay in pending for a while, then it would auto-terminate. After a series of false cues (my colourful herings, at least one of them bright red) Serge noticed that the /var/log/libvirt/qemu/i* logs would have a *LOT* of failed startups, with this message in:
Failed to allocate memory: Cannot allocate memory
2011-04-07 12:34:43.451: shutting down
then the instance would retry a few times.
looking around, I finally noticed that when this happened -- about 99.9% of the times in my tests -- the walrus would, at about the same time, barf with some messages like:
/var/log/eucalyptus/cloud-debug.log:12:35:03 ERROR [WalrusImageManager:Thread-73] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusManager.listBucket(WalrusManager.java:1508)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.ListBucket(WalrusControl.java:273)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusManager.listBucket(WalrusManager.java:1508)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.ListBucket(WalrusControl.java:273)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusManager.listBucket(WalrusManager.java:1508)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.ListBucket(WalrusControl.java:273)
/var/log/eucalyptus/cloud-error.log:20:17:58 ERROR [WalrusImageManager:Thread-53] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusManager.listBucket(WalrusManager.java:1508)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.ListBucket(WalrusControl.java:273)
/var/log/eucalyptus/cloud-error.log:20:25:42 ERROR [WalrusImageManager:Thread-57] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusManager.listBucket(WalrusManager.java:1508)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.ListBucket(WalrusControl.java:273)
/var/log/eucalyptus/cloud-error.log:20:26:55 ERROR [WalrusImageManager:Thread-61] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log:20:45:37 ERROR [WalrusImageManager:Thread-67] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log:21:15:37 ERROR [WalrusImageManager:Bukkit.16] Tired of waiting to cache image: natty-current-amd64-20110406201745/natty-server-uec-amd64-loader.manifest.xml giving up
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusImageManager.getDecryptedImage(WalrusImageManager.java:1136)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.GetDecryptedImage(WalrusControl.java:342)
/var/log/eucalyptus/cloud-error.log:08:50:25 ERROR [WalrusImageManager:Thread-71] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-error.log:09:20:25 ERROR [WalrusImageManager:Bukkit.16] Tired of waiting to cache image: natty-current-i386-20110406202528/natty-server-uec-i386-loader.manifest.xml giving up
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusImageManager.getDecryptedImage(WalrusImageManager.java:1136)
/var/log/eucalyptus/cloud-error.log: at edu.ucsb.eucalyptus.cloud.ws.WalrusControl.GetDecryptedImage(WalrusControl.java:342)
/var/log/eucalyptus/cloud-error.log:12:35:03 ERROR [WalrusImageManager:Thread-73] Unable to cache image. Unable to flush existing images.
/var/log/eucalyptus/cloud-output.log:12:34:31 INFO 736 WalrusImageManager | Unzipping image: natty-current-i386-20110406202528/natty-server-uec-i386-loader.manifest.xml
/var/log/eucalyptus/cloud-output.log:12:34:31 WARN 767 WalrusImageManager | java.io.EOFException: Unexpected end of ZLIB input stream
This was after I decided my current install was hopelessly tainted, and reinstalled an all-in-one + 1 NC.
Logs are being uploaded (full set of logs from both the all-in-one and the NC, whole /var/log).
ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: eucalyptus-walrus 2.0.1+bzr1256-0ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-7.39-server 2.6.38
Uname: Linux 2.6.38-7-server x86_64
.etc.eucalyptus.eucalyptus.cc.conf: CC_NAME="UEC-TEST1"
Architecture: amd64
Date: Thu Apr 7 12:36:46 2011
ProcEnviron:
LC_TIME=en_DK.utf8
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: eucalyptus
UpgradeStatus: No upgrade log present (probably fresh install)
Thanks for the bug Carlos.
I am not convinced this is a regression with Eucalyptus. I wasn't seeing this behaviour last week. Needs further investigation.