After your post I retried my test case (see comment #23 above), except this time using KVM instead of VMWare Server. All details of the virtual machine, i.e. 2xCPU/512MB RAM/8GB HDD are the same - it's just using KVM instead of VMWare.
I can confirm that under KVM and a vanilla install of 10.04(.0) the problem occurs exactly as I described before, and within seconds. This is using a 64bit kernel version 2.6.32-21.32.
I then upgraded the virtual machine to the latest kernel 2.6.32-24.41 and ran the same test case again. This kernel seemed much more stable and did not exhibit problems. Multiple iterations of the gzip loop ran successfully. So I tried running a second loop of gzip processes. Thus there are now two gzip processes both accessing the NFS mount After a minute or so this caused the issue to occur together with the same messages appearing in syslog, the load skyrocketing and the system freezing. So whilst 2.6.32-24.41 appears to me to be better, I do not believe the fundamental problem has been solved. Repeating the test subsequent times, it sometimes took two simultaneous gzip processes to trigger the error, and other times three. I used the following script to run the gzip processes:
#!/bin/bash
while true
do
gzip -c /mnt/srv/test >/mnt/srv/test$$.gz
done
If I could also make a couple of observations. Everyone I have seen having this issue (or issues that look in my opinion suspiciously close to this one) seems to be running a 64bit kernel - I have not tested a 32bit kernel. My test case uses the loopback interface, so switch speed and/or network card should be irrelevant. Also having an amount of load on the NFS client seems to be important, that is why my test case attempts to gzip a file of random data. I'm not sure your simple dd would generate enough of a real workload.
Jim
BTW In the original test case I missed out a "chmod 777 /srv", but that really isn't going to change anything.
Hi Tim,
After your post I retried my test case (see comment #23 above), except this time using KVM instead of VMWare Server. All details of the virtual machine, i.e. 2xCPU/512MB RAM/8GB HDD are the same - it's just using KVM instead of VMWare.
I can confirm that under KVM and a vanilla install of 10.04(.0) the problem occurs exactly as I described before, and within seconds. This is using a 64bit kernel version 2.6.32-21.32.
I then upgraded the virtual machine to the latest kernel 2.6.32-24.41 and ran the same test case again. This kernel seemed much more stable and did not exhibit problems. Multiple iterations of the gzip loop ran successfully. So I tried running a second loop of gzip processes. Thus there are now two gzip processes both accessing the NFS mount After a minute or so this caused the issue to occur together with the same messages appearing in syslog, the load skyrocketing and the system freezing. So whilst 2.6.32-24.41 appears to me to be better, I do not believe the fundamental problem has been solved. Repeating the test subsequent times, it sometimes took two simultaneous gzip processes to trigger the error, and other times three. I used the following script to run the gzip processes:
#!/bin/bash
while true
do
gzip -c /mnt/srv/test >/mnt/srv/test$$.gz
done
If I could also make a couple of observations. Everyone I have seen having this issue (or issues that look in my opinion suspiciously close to this one) seems to be running a 64bit kernel - I have not tested a 32bit kernel. My test case uses the loopback interface, so switch speed and/or network card should be irrelevant. Also having an amount of load on the NFS client seems to be important, that is why my test case attempts to gzip a file of random data. I'm not sure your simple dd would generate enough of a real workload.
Jim
BTW In the original test case I missed out a "chmod 777 /srv", but that really isn't going to change anything.