nfs mount options seem to trigger failure and dead mount

Bug #1928140 reported by Jay Kuri
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad Mojo Specs
Fix Released
High
Colin Watson

Bug Description

We have been experiencing a number of nfs mount timeouts on the turnip machines.

Upon investigation, a common error is in the ganesha logs on the nfs host:

rpc :TIRPC :EVENT :svc_ioq_flushv() writev failed (11)

The 11 indicates an error of EAGAIN - which suggests that it should be trying again to send
to the client. The client, however, appears to no longer be listening.

Upon investigation I found a report of similar behavior in the ganesha.nfs server:

https://<email address hidden>/thread/Q7GFOEV5RLGOC54VJCU6NSMF66AXVU5N/

This suggests that it is experiencing an error, and the client is considering the nfs mount dead - which is backed up by what we see in the logs on the turnip machine. What should happen under normal circumstances is the client keeps trying. We have disabled this with the 'soft' option and limited retries using the retrans option, to two.

In the case of the above thread, the original reporter switched the 'soft' option off in the NFS mount and it solved the problem. If we can not disable the soft option, we should consider at least increasing the retrans nfs mount option from 2 to something higher like 10 (or whatever other number seems appropriate)

Related branches

Revision history for this message
Jay Kuri (jk0ne) wrote :

Additional information.

This issue recurred after filing this bug. While the previous workaround had been to restart ganesha.nfsd, which did work... this time I attempted to remount instead from the client.

mount -o remount .....

This worked perfectly and restored a functioning mount - which suggests that the issue is indeed on the client end (or at least requires the client to reconnect in order to recover)

Colin Watson (cjwatson)
information type: Private → Public
Colin Watson (cjwatson)
affects: turnip → launchpad-mojo-specs
Changed in launchpad-mojo-specs:
assignee: nobody → Colin Watson (cjwatson)
status: New → In Progress
importance: Undecided → High
Colin Watson (cjwatson)
Changed in launchpad-mojo-specs:
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

Deployed on 2021-05-25.

Changed in launchpad-mojo-specs:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.