Comment 1 for bug 2033892

Revision history for this message
In , rsable (rsable-redhat-bugs) wrote :

Description of problem:

Autofs mounts with --ghost or browse_mode=yes enabled, triggers a mount or shows error "ls: cannot access 'XXXX': No such file or directory" when ls -l is run

Either errors are seen for mount points which we know are inaccessible for this client or
a mount is triggered for accessible mounts.

Version-Release number of selected component (if applicable):
autofs-5.1.4-74.el8.x86_64
coreutils-8.30-12.el8.x86_64

(however, I am starting the bug with autofs as affected component as discussed with Ian)

How reproducible:

Always

Steps to Reproduce:

1. Upgrade to RHEL 8.5 (which should have autofs-5.1.4-74.el8.x86_64 and coreutils-8.30-12.el8.x86_64)
2. Create an autofs map :
~~~
[root@rsablerhel85 mnt2]# grep -i mnt /etc/auto.master
/mnt2 /etc/auto.indirect timeout=600,bg,tcp,hard,vers=3,rsize=32768,wsize=32768,timeo=600,retrans=6

[root@rsablerhel85 mnt2]# cat /etc/auto.indirect
testshare rsable76server:/testshare <<<<< testshare is a valid export from server
testshare2 rsable76server:/testshare2 <<<<< testshare2 is not available to this client or could be a bogus entry
~~~
3. Either use --ghost in auto.master as an option or set browse_mode=yes :
~~~
[root@rsablerhel85 mnt2]# grep -i browse /etc/autofs.conf
# browse_mode - maps are browsable by default.
browse_mode = yes
~~~
4. Cd to /mnt2 and run ls -l / ll.

Note : this issue occurs irrespective of direct or indirect maps.

Actual results:

Mount is triggered and ll throws ENOENT for testshare2
~~~
[root@rsablerhel85 mnt2]# ll
ls: cannot access 'testshare2': No such file or directory <<<<< Error
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare <<<<< mount is triggerd for testshare
d?????????? ? ? ? ? ? testshare2 <<<<< Path we know that is inaccessible throws an error

[root@rsablerhel85 mnt2]# mount | grep -i test
rsable76server:/testshare on /mnt2/testshare type nfs (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=192.168.122.58,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=192.168.122.58)
~~~

Expected results:
Mount should not be trigger and error "ls: cannot access 'testshare2': No such file or directory"
should not be seen.

Additional info:

I think the issue is with a behavior change in coreutils-common-8.30-12.el8.
Reverting back to coreutils-common-8.30-8.el8 this issue goes away :
~~~
[root@rsablerhel85 mnt2]# ll
ls: cannot access 'testshare2': No such file or directory
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare
d?????????? ? ? ? ? ? testshare2

[root@rsablerhel85 mnt2]# dnf downgrade coreutils-8.30-8.el8.x86_64
Downgraded:
  coreutils-8.30-8.el8.x86_64 coreutils-common-8.30-8.el8.x86_64

Complete!
[root@rsablerhel85 mnt2]# ll
total 0
drwxrwxrwx. 3 1000 1000 15 Jan 17 12:08 testshare
drwxr-xr-x. 2 root root 0 Jan 21 11:47 testshare2
~~~

I can see that coreutils-common-8.30-12.el8 calls statx while coreutils-common-8.30-8.el8 calls lstat :
~~~
coreutils-8.30-12
3181 12:02:13.828462 getdents64(3, [{d_ino=27279, d_off=1, d_reclen=24, d_type=DT_DIR, d_name="."}, {d_ino=27279, d_off=2, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=27281, d_off=3, d_reclen=32, d_type=DT_DIR, d_name="testshare"}, {d_ino=27280, d_off=4, d_reclen=32, d_type=DT_DIR, d_name="testshare2"}], 32768) = 112 <0.000018>
3181 12:02:14.033318 statx(AT_FDCWD, "testshare2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_UID|STATX_GID|STATX_MTIME|STATX_SIZE, 0x7ffc0d6f1c60) = -1 ENOENT (No such file or directory) <0.035781>
~~~
~~~
coreutils-8.30-8
2854 12:01:11.302926 getdents64(3, [{d_ino=27279, d_off=1, d_reclen=24, d_type=DT_DIR, d_name="."}, {d_ino=27279, d_off=2, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=27281, d_off=3, d_reclen=32, d_type=DT_DIR, d_name="testshare"}, {d_ino=27280, d_off=4, d_reclen=32, d_type=DT_DIR, d_name="testshare2"}], 32768) = 112 <0.000027>
2854 12:01:11.311912 lstat("testshare2", {st_dev=makedev(0, 0x31), st_ino=27280, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_atime_nsec=580732805, st_mtime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_mtime_nsec=580732805, st_ctime=1642783648 /* 2022-01-21T11:47:28.580732805-0500 */, st_ctime_nsec=580732805}) = 0 <0.000030>
~~~

It seems to me that coreutils-8.30-12 and inherently statx does not pass the flag AT_NO_AUTOMOUNT during this operation.
Checking around a few more it seems that vfs_lstat is just a wrapper to use vfs_statx internally and this explicitly sets AT_NO_AUTOMOUNT :
~~~
3193 static inline int vfs_lstat(const char __user *name, struct kstat *stat)
3194 {
3195 return vfs_statx(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT,
3196 stat, STATX_BASIC_STATS);
3197 }
~~~
So it may be just a question of why statx syscall does not use AT_NO_AUTOMOUNT as a flag, unless I am wrong in the last few bits.