Ubuntu server freeze when running more than one cpu in VMware ESX 3

Bug #261937 reported by Mike Basinger
20
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
Hardy
Fix Released
High
Stefan Bader

Bug Description

In VMware 3, if you have a Ubuntu 8.04.1 server as a guest OS with more than one CPU, the Ubuntu 8.04.1 server will occasional freeze.

More discussion here: http://ubuntuforums.org/showthread.php?p=5543982

Revision history for this message
James Troup (elmo) wrote :

We've also run into this with VMWare ESX 3.5i and both the 8.04 and
8.04.1 (installer) kernels. Adding 'clocksource=acpi_pm' as mentioned
in the forums thread, fixed it for us. See also:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020

Without the workaround we were seeing all sorts of bizarre and often
fatal problems. Tim thinks the kernels in -proposed may also fix this
but I wasn't able to test this yet.

Changed in linux-meta:
status: New → Confirmed
Revision history for this message
Steve Langasek (vorlon) wrote :

James, have you had a chance yet to test whether this still applies to the current hardy-updates kernel (which should include everything that was in hardy-proposed in October)?

Changed in linux:
importance: Undecided → High
Steve Langasek (vorlon)
Changed in linux:
milestone: none → ubuntu-8.04.3
status: New → Confirmed
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

I think a bug I reported recently, bug #316187, might be a duplicate of this one. I should be able to test this bug and #316187 this week.

Andy Whitcroft (apw)
Changed in linux:
importance: Undecided → High
status: Confirmed → Incomplete
status: Confirmed → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

Etienne,

Any update on this bug?

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Steve,

The soft lockup error discussed in #316187 is, apparently, not really a bug (more like a spurious warning).

Personally, I have not been able to reproduce a solid freeze (oops, panic, or whatever) so I do not have a solid test case for this specific bug. Maybe the bug reporter (Mike Basinger) can narrow down what trigger the freeze for him, or at least explain the circumstance? If he do, I would be more than willing to chase it down and do more testing.

Eventually, I will be testing the problem discussed by James above (server installer freezing), but this will take a bit of time.

Stefan Bader (smb)
Changed in linux:
assignee: nobody → stefan-bader-canonical
Revision history for this message
Stefan Bader (smb) wrote :

The latest Hardy proposed kernel (2.6.24-24.50) contains a series of patches that are supposed to fix TSC related issues when running vmware. If you could test with this kernel and report back if this helped this problem, that would be great.

Revision history for this message
Stefan Bader (smb) wrote :

There has been another issue with timekeeping and vmware when using VMI clocksource (reported as bug #345553). I have uploaded kernels based on that input to http://people.ubuntu.com/~smb/bug261937/ (those require an update to the -24 kernel provided in proposed before). Could you lease test and give feedback here. Thanks.

Revision history for this message
Stefan Bader (smb) wrote :

I think this should be fixed with the latest Hardy kernels (not the VMI case, though). James/Mike could you confirm this please?

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Steve Langasek (vorlon) wrote :

Ronald, please test whether this problem is reproducible for you on the ESX instance at your disposal. We would like to have this closed out for Ubuntu 8.04.3 (due early next month).

Revision history for this message
Ronald McCollam (fader) wrote :

So far I have been unable to reproduce this issue. I've been testing on two machines, a 32- and 64-bit VM, each with 4 processors. I've left them running unattended, run large compiles on them to peg the processors, alternated between pegging the CPUs and leaving them idle, and nothing has yet managed to lock these up.

They're running 8.04.2 currently. Was anyone still seeing this issue on 8.04.2 or do I need to go back and try on e.g. 8.04.1?

Revision history for this message
Steve Langasek (vorlon) wrote :

Ronald,

The presumed fix for this was already present in 8.04.2, yes; I think you should be testing with the initial 8.04 kernel to try to reproduce the problem.

In the meantime I'm marking this bug as resolved since you say it's not reproducible - if someone can reproduce it the bug should be reopened.

Changed in linux (Ubuntu Hardy):
status: Incomplete → Fix Released
Revision history for this message
Steve Beattie (sbeattie) wrote :

I went ahead and converted ESX instances Ronald was testing back to stock 8.04 installations, and the amd64 instance has reproduced the freeze behavior three times in a period of a couple of hours (the first time was the first boot post-installation.). I've updated the kernel to the latest in hardy-updates to verify that it's not a userspace related issue, but given that the prior 8.04.2 installations showed no signs of the freezing behavior, I'm convinced this issue has been addressed.

Thanks for everyone's patience.

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

As James mentioned above, the problem can manifest itself during installation. I presume the kernel used for the 8.04.2 ISO is indeed one that include the fix, but what about the netboot installer? The hardy netboot.tar.gz found in the archive is dated June 19th 2008. Will a new netboot installer, using a later kernel, be released?

Unless, of course, an updated netboot installer is available somewhere that I missed.

Revision history for this message
Steve Langasek (vorlon) wrote :

Etienne,

Right, sorry - it's something of a gap in our SRU publishing procedures that netboot images don't get copied to the release directory when debian-installer SRUs are published. I've copied the current hardy-updates version of the netboot images over now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.