When DMA is disabled system freeze on high memory usage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gentoo Linux |
New
|
Undecided
|
Unassigned | ||
linux (Arch Linux) |
New
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Incomplete
|
Low
|
Unassigned |
Bug Description
I run a batch matlab job server here at my lab, running Dapper 6.06 (for the LTS). One of the users has submitted a very memory-consuming job, which successfully crashes the server. Upon closer inspection, the crash happens like this:
1. I run matlab with the given file (as an ordinary, unpriveleged user)
2. RAM usage quickly fills up
3. Once the RAM meter hits 100%, the system freezes: All SSH connections freeze up, and while switching VTs directly on the machine works, no new processes run - so one can't log in, or do anything if he is logged in. (Sometimes typing doesn't work at all)
Note that the swap - while 7 gigs of it are available - is never used. (The machine has 7 gigs of RAM as well)
I've tried the same on my Gutsy 32-bit box, and there was no system freezeup - matlab simply notified that the system was out of memory. However, it did this once memory was 100% in use - and still, swap didn't get used at all! (Though it is mounted correctly and shows up in "top" and "free").
So first thing's first - I'd like to eliminate the crash issue. I suppose I could switch the server to 32-bit, but I think that would be a performance loss, considering that it does a lot of heavy computation. There is no reason, however, that this should happen on a 64-bit machine anyway. Why does it?
WORKAROUND: Enabling DMA in the BIOS
Changed in linux (Ubuntu): | |
importance: | Undecided → Low |
information type: | Public → Public Security |
information type: | Public Security → Private Security |
information type: | Private Security → Public |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in fedora: | |
importance: | Unknown → Medium |
status: | Unknown → Confirmed |
Changed in fedora: | |
status: | Confirmed → Won't Fix |
summary: |
- System freeze on high memory usage + System freeze on high memory usage when DMA is disabled |
no longer affects: | linux (Ubuntu) |
affects: | linux (Arch Linux) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
no longer affects: | linux (Ubuntu) |
affects: | fedora → ubuntu |
Changed in ubuntu: | |
importance: | Medium → Undecided |
status: | Won't Fix → New |
importance: | Undecided → Low |
status: | New → Incomplete |
affects: | ubuntu → linux (Ubuntu) |
tags: | added: patch |
summary: |
- System freeze on high memory usage when DMA is disabled + When DMA is disabled system freeze on high memory usage |
tags: | removed: patch |
tags: | added: patch |
tags: | added: cscc |
I can confirm this in general for every linux distribution I've ever used. Any time I have a process that is both using 100% CPU and eats up memory, the system becomes unusable as soon as it starts using swap. At this point the hard drive starts thrashing and X slows to a crawl (the pointer updates maybe every 30 seconds). My only options at this point are to 1) hope the program finishes and gives some memory back, 2) wait for swap to fill completely so the kernel will kill the program, or 3) reboot the computer. The latter option is usually 5-10 minutes faster. I think this is a very meaningful bug report, and one that I'd love to see some attention given to, although I have no real idea what the solution might be. The only workaround I've found is just to disable swap completely (I'll bet your swap just wasn't enabled on your 32-bit box?).
Of course it's expected that things will perform badly when the system is out of memory, but it's pretty rediculous that as soon as RAM is full there aren't even enough resources for me to get to a console, log in, and kill the program myself. It seems to me that if one program is spending all of its time writing swap pages, there should at least be plenty of CPU left over for me to operate the mouse, so it seems like there's something else going on that causes the system to crawl..
So the question is: can we come up with a reasonable fix for this problem, or do we just accept that any runaway process can crash the machine? For the time being, I'm happy running swapless.