Launchpad Mojo Specs

nfs mount options seem to trigger failure and dead mount

Bug #1928140 reported by Jay Kuri on 2021-05-11

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Launchpad Mojo Specs	Fix Released	High	Colin Watson

Bug Description

We have been experiencing a number of nfs mount timeouts on the turnip machines.

Upon investigation, a common error is in the ganesha logs on the nfs host:

rpc :TIRPC :EVENT :svc_ioq_flushv() writev failed (11)

The 11 indicates an error of EAGAIN - which suggests that it should be trying again to send
to the client. The client, however, appears to no longer be listening.

Upon investigation I found a report of similar behavior in the ganesha.nfs server:

https://<email address hidden>/thread/Q7GFOEV5RLGOC54VJCU6NSMF66AXVU5N/

This suggests that it is experiencing an error, and the client is considering the nfs mount dead - which is backed up by what we see in the logs on the turnip machine. What should happen under normal circumstances is the client keeps trying. We have disabled this with the 'soft' option and limited retries using the retrans option, to two.

In the case of the above thread, the original reporter switched the 'soft' option off in the NFS mount and it solved the problem. If we can not disable the soft option, we should consider at least increasing the retrans nfs mount option from 2 to something higher like 10 (or whatever other number seems appropriate)

Related branches

~cjwatson/launchpad-mojo-specs:git-hard-mount

Merged into launchpad-mojo-specs:master at revision 571f506e0f30432d44b7f294c8534fc6feb113d5

Ioana Lasc (community): Approve on 2021-05-18

Revision history for this message

Jay Kuri (jk0ne) wrote on 2021-05-11:

Additional information.

This issue recurred after filing this bug. While the previous workaround had been to restart ganesha.nfsd, which did work... this time I attempted to remount instead from the client.

mount -o remount .....

This worked perfectly and restored a functioning mount - which suggests that the issue is indeed on the client end (or at least requires the client to reconnect in order to recover)

Colin Watson (cjwatson) on 2021-05-12

information type:

Private → Public

Colin Watson (cjwatson) on 2021-05-12

affects:	turnip → launchpad-mojo-specs
Changed in launchpad-mojo-specs:
assignee:	nobody → Colin Watson (cjwatson)
status:	New → In Progress
importance:	Undecided → High

Colin Watson (cjwatson) on 2021-05-18

Changed in launchpad-mojo-specs:
status:	In Progress → Fix Committed

Revision history for this message

Colin Watson (cjwatson) wrote on 2021-06-01:

Deployed on 2021-05-25.

Changed in launchpad-mojo-specs:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.