More than one client with nfs home directories will cause freeze

Bug #1024313 reported by HansHook
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Setup:

2 similar company sites with one server and 5-10 clients.
Bug is preventing uppgrade of well working 10.04 system to 12.04
Problem is becoming a real big issue since it is at the heart of company IT infrastructure.

Server:
Ubuntu 10.04 server amd64 running kernels 2.6.35-32-server or 3.0.0-22-server.
Not tested with standard/original kernel.
Servers have a 6TB raid array and exports home and archive directories to clients and virtual servers using nfsv4.
Services running on server: OpenLDAP, NFS, SAMBA, NTP, DHCP, DNS and KVM (with about 12 guests).

Clients:
Several Ubuntu 10.04.4 clients with kernels 3.0.0-22 in the process of uppgrading (currently dual booting) to 12.04 (and in one case 11.10).
All clients authenticates via LDAP and mounts home directories usig nfsv4 (static fstab mounts - no automounter).

Setup with any number of 10.04 client and maximum of one 11.10 or 12.04 client is working perfectly - thus we passed initial 12.04 evaluation with flying collors..

Bug description:
When booting two or more 12.04 or 11.10 clients all 12.04 (or 11.10) clients will freeze at login of any two (random) users.
This will often occurr instantly but at times after a short delay.
All 10.04 clients work well all the time - they are mounting the same shares but are not affected.
Somtimes reboot from console is possibel on a frozen client but most often hard reset (poweroff on the button) is required.
After turing off the second 12.04 client the first one will be accesible again.

As long as therie is only a single 11.10 or 12.04 client booted it will works as well as any 10.04...

Have tried with tcp or udp, sync or async and all kinds of frame sizes etc - no change.
Have replaced NICs to intel...
Kubuntuo or Ubuntu does not matter....
Found nothing usefull (!!?) in the logs this far...

Network performance is very good with about 90 MB/s read/write speeds on nfs directories.

(Only previous unsolved problem noted on 10.04 clients are bzr repos on nfs that will cause a freeze.)

System seup is tradional and has been working for some 8 years and goes back before Ubuntu...
(Still have some 8.04 clients that works exceptionally fine.)

The funny thing is that we use kernel 3.0.0 on our 10.04 clients - same as the 11.10 client that is affected.
Does this suggests it might not be a kernel issue after all?

We will provide any assistance we are able to that is needed in order to solve this issue.

Regards

Hans Höök

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrey Konstantinov (andreyinvolute) wrote :

Same issue here: NFS server running Debian Squeeze amd64 and clients running Ubuntu 12.04 i386 (everything is up-to-date). One 12.04 client can log in just fine; more than one freezes ALL clients which are currently up and causes high (5.0 up from normal 0.5-0.8) load on the NFS server which keeps increasing until the problematic clients are shut down (and just like in the above case, console rarely works, power button or Alt+Printscreen+REISUO is required to switch them off). I am also running OpenLDAP and DHCP there.

We inspected tcpdump logs with Wireshark and found NFS complaining about "utk 10011" error, which I have found no information about whatsoever.

From hours of researching this issue, I'm confident that the problem is in the nfs-common on the client side, since the issue happens no matter whas OS the server is running (10.04, Debian or 12.04 are the ones I've seen mentioned so far).

This bug should be given very high priority, since it makes 12.04 clients (and I've read that 11.10 are affected as well) unusable with NFS.

Revision history for this message
Andrey Konstantinov (andreyinvolute) wrote :

We did some more testing and I should mention that the problem is fixed by switching to/forcing NFSv3, which is also the default before 11.10/12.04.

Revision history for this message
HansHook (hans-hook) wrote :

I have also found the infromation that NFSV3 does work.
However I do not really like the word "fixed".
We have been using V4 since around 2009 (with 8.04 and 10.04) for a reason.
NFS3 does not allow for an elaborate use of groups, and does not support acls either I belive.

It is like if ext3 and 4 was brooken using ext2 would not be an acceptable solution.

I can only agree that this bug actually makes 12.04 clients unusable in a corporate environment
including NFS.

I sincerly hope the community does not find that acceptable.

Regards
HH

Revision history for this message
Andrey Konstantinov (andreyinvolute) wrote :

Bad word choice on my side, of course downgrading to NFSv3 is not a fix, but rather a workaround. We have also been using NFSv4 since 10.04 and NFSv3 has problems with lock files, as far as I've heard, so of course NFSv4 needs to be fixed.

Additional information (not sure if relevant to this bug but might be): NFSv4 takes a minute or two to mount user home directories after boot so one cannot login right away: "bootwait" option needs to be added to /etc/fstab in order to make the machine wait until NFS home is mounted and only display the login screen after. Again, this does not happen with NFSv3 and did not happen with NFSv4 on 10.04. Whatever changes were done to NFSv4 since 11.10, they only made things worse.

Revision history for this message
HansHook (hans-hook) wrote :

I can also confirm the substantial extra waiting time when bootwait is added - which is needed for a
home directory.

Also anyone, that likes to go the nfs v3 way, may save some time not making my mistake....
In /etc/fstab an nfs3 mount needs the '/export' prefix on the mounts

For example with nfs v4:
1.2.3.4:/share /mountpoint nfs4 bootwait,nfsvers=4 0 0
In order to use nfs v3 change it to:
1.2.3.4:/export/share /mountpoint nfs bootwait,nfsvers=3 0 0

I have not yet been able to test how my setup performes with v3 though.

Regards

Revision history for this message
HansHook (hans-hook) wrote :

I can now confirm that reverting to the old NFS v3 on the client resolves the freezing issues.

Still this does not allow us to upgrade 10.04 to 12.04 since in practice v4 is needed.

I sure hope we receive some attention to this bug. NFS is a cornerstone in Linux/UNIX and
has been for ages.

Revision history for this message
Andrey Konstantinov (andreyinvolute) wrote :

It's been one and a half months and the bug persists, preventing the upgrade to 12.04. Any updates?

Revision history for this message
HansHook (hans-hook) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.