nfs4 leaves megabytes of errors in syslog

Bug #258651 reported by gpk
6
Affects Status Importance Assigned to Milestone
nfs-utils (CentOS)
New
Undecided
Unassigned
nfs-utils (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Binary package hint: nfs-common

I have a little 2-computer network using NFS.
The client is ubuntu 8.04.1, the server is the current Debian testing
distribution.

When I share files with nfs4, the connection sometimes hangs
so you can do "ls n" (where "n" is a NFS mountpoint) and it
hangs forever. It typically runs for tens of minutes with light use
before it freezes. When it freezes, the kernel is still running,
everything is running that does not try to access files mounted
over NFS.

I know it's not a server freeze because I can connect
to the same NFS server, same exported file system from
another computer, and it'll work (at least for a while).
I've seen this both ways: where Ubuntu freezes, but
Debian can access itself via NFS and where Debian freezes
but Ubuntu can access Debian via NFS.

The equivalent configuration works reliably with NFSv3.

When it's frozen, my /var/log/syslog on the Ubuntu client
side rapidly fills up with error messages:

Aug 16 20:04:47 kitchen kernel: [10854.865221] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.866133] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.866849] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.867614] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.867995] NFSv4 callback: too many open TCP sockets, consider increasing the number of nfsd threads
Aug 16 20:04:47 kitchen kernel: [10854.868003] NFSv4 callback: last TCP connect from 192.168.3.2, port=42971
Aug 16 20:04:47 kitchen kernel: [10854.869477] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.870381] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.871131] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.871874] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.872880] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.873780] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.874491] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
...
Aug 16 20:04:47 kitchen kernel: [10854.971314] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.972318] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.974021] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.975075] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.976335] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.977571] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.978437] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.979118] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.979788] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.980474] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.983996] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
...

It's mostly error 22.

This goes on for thousands and thousands of lines, at the rate of 1000 lines per second!

$ lsb_release -rd
Description: Ubuntu 8.04.1
Release: 8.04
$

$ apt-cache policy nfs-common
nfs-common:
  Installed: 1:1.1.2-2ubuntu2.1
  Candidate: 1:1.1.2-2ubuntu2.1
  Version table:
 *** 1:1.1.2-2ubuntu2.1 0
        500 http://gb.archive.ubuntu.com hardy-updates/main Packages
        100 /var/lib/dpkg/status
     1:1.1.2-2ubuntu2 0
        500 http://gb.archive.ubuntu.com hardy/main Packages
$

Relevant lines from /etc/fstab: (NFS4 is currently commented out,
but it was active earlier).

# desk.local:/gpk /home/gpk/n nfs4 bg,intr 0 0
desk.local:/export/big/gpk /home/gpk/n nfs bg,intr 0 3
# desk.local:/MyDocuments/gpk /home/gpk/MyDocuments nfs4 bg,intr 0 0
desk.local:/export/big/MyDocuments/gpk /home/gpk/MyDocuments nfs bg,intr 0 3

Here's /etc/exports on the server (again, the NFS4 lines are currently commented out,
but they were active a little while ago):

# /export/big 192.168.2.2(rw,fsid=0,root_squash,subtree_check) 127.0.0.1(rw,fsid=0,root_squash,subtree_check) 192.168.3.2(rw,fsid=0,root_squash,subtree_check)
/export/big 192.168.2.2(rw,root_squash,subtree_check) 127.0.0.1(rw,root_squash,subtree_check) 192.168.3.2(rw,root_squash,subtree_check)
gpk@desk:~$

ssh, web, and ping connections between the two machines work nicely.
It's a standard wired network, specified in /etc/networks/interfaces .

Revision history for this message
gpk (gpk-kochanski) wrote :

I've changed the title because I've gotten NFS4 to work reliably by changing the configuration. Now, I do only one mount, rather than two.

So, it's really that NFS4 cannot handle a misconfiguration gracefully, or that it does not detect a misconfiguration. And/or that the documentation doesn't make it obvious that it isn't an allowed configuration.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Please try to reproduce on a newer release. Sounds like a nfs4 server bug to me (=kernel).

Changed in nfs-utils (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.