I can repro this issue quite easily with my setup. I'm running two amd64 kvm guests on amd64 host system with 8GB of memory. Nfs server is running on the host, and guests heavily rely on it. All systems are up-to-date, kernel is 2.6.32-23.
So the guests hang when they heavily access nfs mounts, it seems that write operations are needed. First I used nfs3, then switched to nfs4, but it didn't really help.
I have had this issue since upgrading to Lucid, and never had anything like this with Karmic, where I had exactly the same setup.
dmesg log attached, both from the host and a guest.
One way to repro this is to run a script on the guest that processes (copies) image files over nfs, this hangs after processing around 20-50 files. System load starts to increase after the script hangs, I have seen loads way over 200. After this happens, also all other processes accessing nfs mounts hang. Cannot reboot, have to hard reset the guest.
I can repro this issue quite easily with my setup. I'm running two amd64 kvm guests on amd64 host system with 8GB of memory. Nfs server is running on the host, and guests heavily rely on it. All systems are up-to-date, kernel is 2.6.32-23.
So the guests hang when they heavily access nfs mounts, it seems that write operations are needed. First I used nfs3, then switched to nfs4, but it didn't really help.
host export: 0.0/16( rw,nohide, insecure, no_subtree_ check,async)
/srv/mmedia 172.16.
guest fstab mount:
172.16.1.1:/mmedia /mmedia nfs4 _netdev,auto 0 0
I have had this issue since upgrading to Lucid, and never had anything like this with Karmic, where I had exactly the same setup.
dmesg log attached, both from the host and a guest.
One way to repro this is to run a script on the guest that processes (copies) image files over nfs, this hangs after processing around 20-50 files. System load starts to increase after the script hangs, I have seen loads way over 200. After this happens, also all other processes accessing nfs mounts hang. Cannot reboot, have to hard reset the guest.
syslog from the gust: ------- ------- ----- kernel/ hung_task_ timeout_ secs" disables this message. 3b0>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] 357>] io_schedule+ 0x47/0x70 3be>] nfs_wait_ bit_uninterrupt ible+0xe/ 0x20 [nfs] bbf>] __wait_ on_bit+ 0x5f/0x90 3b0>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] c68>] out_of_ line_wait_ on_bit+ 0x78/0x90 470>] ? wake_bit_ function+ 0x0/0x40 39f>] nfs_wait_ on_request+ 0x2f/0x40 [nfs] 7df>] nfs_wait_ on_requests_ locked+ 0x7f/0xd0 [nfs] c1e>] nfs_sync_ mapping_ wait+0x9e/ 0x1a0 [nfs] 009>] nfs_write_ mapping+ 0x79/0xb0 [nfs] 7d0>] ? mntput_ no_expire+ 0x30/0x110 077>] nfs_wb_ all+0x17/ 0x20 [nfs] f9a>] nfs_do_ fsync+0x2a/ 0x60 [nfs] 1e5>] nfs_file_ flush+0x75/ 0xa0 [nfs] 73c>] filp_close+ 0x3c/0x90 847>] sys_close+ 0xb7/0x120 1b2>] system_ call_fastpath+ 0x16/0x1b kernel/ hung_task_ timeout_ secs" disables this message. 3b0>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] 357>] io_schedule+ 0x47/0x70 3be>] nfs_wait_ bit_uninterrupt ible+0xe/ 0x20 [nfs] bbf>] __wait_ on_bit+ 0x5f/0x90 3b0>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] c68>] out_of_ line_wait_ on_bit+ 0x78/0x90 470>] ? wake_bit_ function+ 0x0/0x40 39f>] nfs_wait_ on_request+ 0x2f/0x40 [nfs] 7df>] nfs_wait_ on_requests_ locked+ 0x7f/0xd0 [nfs] c1e>] nfs_sync_ mapping_ wait+0x9e/ 0x1a0 [nfs] 009>] nfs_write_ mapping+ 0x79/0xb0 [nfs] 7d0>] ? mntput_ no_expire+ 0x30/0x110 077>] nfs_wb_ all+0x17/ 0x20 [nfs] f9a>] nfs_do_ fsync+0x2a/ 0x60 [nfs] 1e5>] nfs_file_ flush+0x75/ 0xa0 [nfs] 73c>] filp_close+ 0x3c/0x90 847>] sys_close+ 0xb7/0x120 1b2>] system_ call_fastpath+ 0x16/0x1b
-------
Jul 12 13:42:14 scotty kernel: [ 360.190575] INFO: task perl:4360 blocked for more than 120 seconds.
Jul 12 13:42:14 scotty kernel: [ 360.190585] "echo 0 > /proc/sys/
Jul 12 13:42:14 scotty kernel: [ 360.190592] perl D 0000000000000000 0 4360 4358 0x00000000
Jul 12 13:42:14 scotty kernel: [ 360.190605] ffff8800b02ffc48 0000000000000082 0000000000015bc0 0000000000015bc0
Jul 12 13:42:14 scotty kernel: [ 360.190616] ffff8800ae73c890 ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c4d0
Jul 12 13:42:14 scotty kernel: [ 360.190624] 0000000000015bc0 ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c890
Jul 12 13:42:14 scotty kernel: [ 360.190633] Call Trace:
Jul 12 13:42:14 scotty kernel: [ 360.190729] [<ffffffffa014a
Jul 12 13:42:14 scotty kernel: [ 360.190788] [<ffffffff81541
Jul 12 13:42:14 scotty kernel: [ 360.190816] [<ffffffffa014a
Jul 12 13:42:14 scotty kernel: [ 360.190824] [<ffffffff81541
Jul 12 13:42:14 scotty kernel: [ 360.190850] [<ffffffffa014a
Jul 12 13:42:14 scotty kernel: [ 360.190860] [<ffffffff81541
Jul 12 13:42:14 scotty kernel: [ 360.190905] [<ffffffff81085
Jul 12 13:42:14 scotty kernel: [ 360.190931] [<ffffffffa014a
Jul 12 13:42:14 scotty kernel: [ 360.190964] [<ffffffffa014e
Jul 12 13:42:14 scotty kernel: [ 360.190992] [<ffffffffa014f
Jul 12 13:42:14 scotty kernel: [ 360.191027] [<ffffffffa0150
Jul 12 13:42:14 scotty kernel: [ 360.191060] [<ffffffff8115f
Jul 12 13:42:14 scotty kernel: [ 360.191087] [<ffffffffa0150
Jul 12 13:42:14 scotty kernel: [ 360.191109] [<ffffffffa013e
Jul 12 13:42:14 scotty kernel: [ 360.191131] [<ffffffffa013f
Jul 12 13:42:14 scotty kernel: [ 360.191146] [<ffffffff81141
Jul 12 13:42:14 scotty kernel: [ 360.191153] [<ffffffff81141
Jul 12 13:42:14 scotty kernel: [ 360.191179] [<ffffffff81013
Jul 12 13:44:14 scotty kernel: [ 480.190437] INFO: task perl:4360 blocked for more than 120 seconds.
Jul 12 13:44:14 scotty kernel: [ 480.190446] "echo 0 > /proc/sys/
Jul 12 13:44:14 scotty kernel: [ 480.190453] perl D 0000000000000000 0 4360 4358 0x00000000
Jul 12 13:44:14 scotty kernel: [ 480.190466] ffff8800b02ffc48 0000000000000082 0000000000015bc0 0000000000015bc0
Jul 12 13:44:14 scotty kernel: [ 480.190477] ffff8800ae73c890 ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c4d0
Jul 12 13:44:14 scotty kernel: [ 480.190486] 0000000000015bc0 ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c890
Jul 12 13:44:14 scotty kernel: [ 480.190495] Call Trace:
Jul 12 13:44:14 scotty kernel: [ 480.190534] [<ffffffffa014a
Jul 12 13:44:14 scotty kernel: [ 480.190548] [<ffffffff81541
Jul 12 13:44:14 scotty kernel: [ 480.190582] [<ffffffffa014a
Jul 12 13:44:14 scotty kernel: [ 480.190591] [<ffffffff81541
Jul 12 13:44:14 scotty kernel: [ 480.190617] [<ffffffffa014a
Jul 12 13:44:14 scotty kernel: [ 480.190626] [<ffffffff81541
Jul 12 13:44:14 scotty kernel: [ 480.190637] [<ffffffff81085
Jul 12 13:44:14 scotty kernel: [ 480.190663] [<ffffffffa014a
Jul 12 13:44:14 scotty kernel: [ 480.190690] [<ffffffffa014e
Jul 12 13:44:14 scotty kernel: [ 480.190718] [<ffffffffa014f
Jul 12 13:44:14 scotty kernel: [ 480.190745] [<ffffffffa0150
Jul 12 13:44:14 scotty kernel: [ 480.190756] [<ffffffff8115f
Jul 12 13:44:14 scotty kernel: [ 480.190782] [<ffffffffa0150
Jul 12 13:44:14 scotty kernel: [ 480.190805] [<ffffffffa013e
Jul 12 13:44:14 scotty kernel: [ 480.190827] [<ffffffffa013f
Jul 12 13:44:14 scotty kernel: [ 480.190836] [<ffffffff81141
Jul 12 13:44:14 scotty kernel: [ 480.190843] [<ffffffff81141
Jul 12 13:44:14 scotty kernel: [ 480.190852] [<ffffffff81013