Bug #1274678 “Unable to lazy unmount nfsv3 when server is inacce...” : Bugs : nfs-utils package : Ubuntu

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-06-17:

#1

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status:	New → Confirmed

Revision history for this message

John Gilmore (gnu-gilmore) wrote on 2015-06-17:

#2

This error occurs on an NFS client machine when one of its NFS servers stops responding (e.g. is powered-off). The umount command provides a -l (lazy) option that is supposed to disconnect the mount point from the system so that no future commands that access the file system will hang due to the unresponsive NFS server. This is supposed to work even when the NFS server is not responding. The problem is that a sub-library used by the umount command is doing a readlink() on the filesystem, which causes a hang before umount can actually unmount the filesystem.

The umount program uses a helper program called /sbin/umount.nfs (which is a symlink to /sbin/mount.nfs, and both are part of the nfs-common package), and that's where the bug lies. When you do:

umount -l /images

and /images is an NFS mount, umount invokes:

/sbin/umount.nfs /images -l

and /sbin/umount.nfs does the readlink, which can easily be verified by running the umount command under strace -f. Here is a GDB backtrace of /sbin/umount.nfs when it hangs:

(gdb) run /images -l
Starting program: /sbin/umount.nfs /images -l
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 readlink () at ../sysdeps/unix/syscall-template.S:81
#1 0xb7768a0a in ?? () from /lib/i386-linux-gnu/libmount.so.1
#2 0xb7756073 in mnt_resolve_path () from /lib/i386-linux-gnu/libmount.so.1
#3 0xb7762887 in ?? () from /lib/i386-linux-gnu/libmount.so.1
#4 0xb77666f0 in mnt_context_prepare_umount ()
   from /lib/i386-linux-gnu/libmount.so.1
#5 0x0804afc1 in ?? ()
#6 0xb75b9a83 in __libc_start_main (main=0x804ad10, argc=3, argv=0xbfcc2e34,
    init=0x8057b40, fini=0x8057bb0, rtld_fini=0xb77cc180 <_dl_fini>,
    stack_end=0xbfcc2e2c) at libc-start.c:287
#7 0x0804b4fc in ?? ()

The actual readlink() call seems to occur in libmount, from a function under mnt_resolve_path. If, under GDB, I cause
the readlink function to artificially return -1 rather than do the system call, the rest of the program succeeds in
lazily unmounting the hung filesystem.

I don't know if the proper fix is for /sbin/umount.nfs to avoid calling mnt_resolve_path, or to pass it a parameter that says, "Don't touch that filesystem while trying to resolve the path!!!". I will leave that to the maintainers. All I know is that it hangs
forever if you let the readlink system call occur, but it does the job it's supposed to do if you breakpoint at the readlink and
do "return (int)-1" and "continue" in the debugger.

This error occurs on an NFS client machine when one of its NFS servers stops responding (e.g. is powered-off). The umount command provides a -l (lazy) option that is supposed to disconnect the mount point from the system so that no future commands that access the file system will hang due to the unresponsive NFS server. This is supposed to work even when the NFS server is not responding. The problem is that a sub-library used by the umount command is doing a readlink() on the filesystem, which causes a hang before umount can actually unmount the filesystem.

The umount program uses a helper program called /sbin/umount.nfs (which is a symlink to /sbin/mount.nfs, and both are part of the nfs-common package), and that's where the bug lies. When you do:

umount -l /images

and /images is an NFS mount, umount invokes:

/sbin/umount.nfs /images -l

and /sbin/umount.nfs does the readlink, which can easily be verified by running the umount command under strace -f. Here is a GDB backtrace of /sbin/umount.nfs when it hangs:

(gdb) run /images -l
Starting program: /sbin/umount.nfs /images -l
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 readlink () at ../sysdeps/unix/syscall-template.S:81
#1 0xb7768a0a in ?? () from /lib/i386-linux-gnu/libmount.so.1
#2 0xb7756073 in mnt_resolve_path () from /lib/i386-linux-gnu/libmount.so.1
#3 0xb7762887 in ?? () from /lib/i386-linux-gnu/libmount.so.1
#4 0xb77666f0 in mnt_context_prepare_umount ()
   from /lib/i386-linux-gnu/libmount.so.1
#5 0x0804afc1 in ?? ()
#6 0xb75b9a83 in __libc_start_main (main=0x804ad10, argc=3, argv=0xbfcc2e34,
    init=0x8057b40, fini=0x8057bb0, rtld_fini=0xb77cc180 <_dl_fini>,
    stack_end=0xbfcc2e2c) at libc-start.c:287
#7 0x0804b4fc in ?? ()

The actual readlink() call seems to occur in libmount, from a function under mnt_resolve_path. If, under GDB, I cause
the readlink function to artificially return -1 rather than do the system call, the rest of the program succeeds in
lazily unmounting the hung filesystem.

I don't know if the proper fix is for /sbin/umount.nfs to avoid calling mnt_resolve_path, or to pass it a parameter that says, "Don't touch that filesystem while trying to resolve the path!!!". I will leave that to the maintainers. All I know is that it hangs
forever if you let the readlink system call occur, but it does the job it's supposed to do if you breakpoint at the readlink and
do "return (int)-1" and "continue" in the debugger.

summary:

- Unable to unmount nfsv3 when server is inaccessible
+ Unable to lazy unmount nfsv3 when server is inaccessible

Revision history for this message

John Gilmore (gnu-gilmore) wrote on 2015-06-17:

#3

I found this bug today in ubuntu-14.04.2 LTS. I don't see in the Launchpad interface how to easily specify which release(s) the bug appears in.

Revision history for this message

John Gilmore (gnu-gilmore) wrote on 2015-06-17:

#4

The version of nfs-common that I found the bug in is: nfs-common-1:1.2.8-6ubuntu1.1 which is from the trusty-updates repository.

Luke Faraone (lfaraone) on 2017-09-12

Changed in nfs-utils (Ubuntu):
status:	Confirmed → Triaged
importance:	Undecided → Medium

Ubuntu
nfs-utils package

Unable to lazy unmount nfsv3 when server is inaccessible

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntunfs-utils package

Unable to lazy unmount nfsv3 when server is inaccessible

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
nfs-utils package