nfs4/krb5 mounts hang on server kernel 2.6.32-27-generic

Bug #702385 reported by Brian J. Murrell
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
linux (Ubuntu)
Incomplete
Undecided
Unassigned
Lucid
Won't Fix
Undecided
Unassigned

Bug Description

After upgrading my LTS (Lucid) NFS server from 2.6.32-22-generic to 2.6.32-27-generic, kerberized NFS4 mounts hang on the clients (2.6.35-24-generic) while trying to mount. I have booted my server back and forth between the two kernels and this is 100% reproducible.

On the client, here is the stack trace of the hanging mount:

[83664.144250] SysRq : Show Blocked State
[83664.148028] task PC stack pid father
[83664.148038] mount.nfs4 D e2fbfc38 0 20082 20081 0x00000000
[83664.148038] e2fbfc48 00000086 00000002 e2fbfc38 e2fbfc7c c05d99e0 c08c4700 c08c4700
[83664.148038] a6a99df3 00004bfe c08c4700 c08c4700 a6a88065 00004bfe 00000000 c08c4700
[83664.148038] c08c4700 f6ae6580 00000001 e2fbfc7c 00000000 e2fbfc84 e2fbfc50 f823dc0c
[83664.148038] Call Trace:
[83664.148038] [<f823dc0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
[83664.148038] [<c05c823d>] __wait_on_bit+0x4d/0x70
[83664.148038] [<f823dbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[83664.202534] [<f823dbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[83664.202534] [<c05c830b>] out_of_line_wait_on_bit+0xab/0xc0
[83664.202534] [<c0165f10>] ? wake_bit_function+0x0/0x50
[83664.202534] [<f823e31b>] __rpc_execute+0xdb/0x250 [sunrpc]
[83664.202534] [<f823da17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
[83664.202534] [<c0230bef>] ? mntput_no_expire+0x1f/0xd0
[83664.202534] [<f823e4fe>] rpc_execute+0x6e/0x80 [sunrpc]
[83664.202534] [<f82379af>] rpc_run_task+0x1f/0x30 [sunrpc]
[83664.202534] [<f8237abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
[83664.202534] [<f83966b2>] _nfs4_call_sync+0x22/0x30 [nfs]
[83664.202534] [<f8394795>] nfs4_proc_get_root+0xa5/0x100 [nfs]
[83664.202534] [<f837e6f8>] nfs4_get_rootfh+0x48/0x130 [nfs]
[83664.202534] [<f8380a33>] ? nfs_alloc_fattr+0x23/0xb0 [nfs]
[83664.202534] [<f8378f19>] ? nfs4_init_server+0xf9/0x200 [nfs]
[83664.202534] [<f83784b4>] nfs4_server_common_setup+0x54/0x170 [nfs]
[83664.202534] [<f8379062>] nfs4_create_server+0x42/0xc0 [nfs]
[83664.202534] [<f83834eb>] nfs4_remote_get_sb+0x6b/0x250 [nfs]
[83664.202534] [<c020fadf>] ? __alloc_percpu+0xf/0x20
[83664.202534] [<c0231629>] ? alloc_vfsmnt+0xf9/0x130
[83664.202534] [<c021b354>] vfs_kern_mount+0x74/0x1c0
[83664.202534] [<f83848b9>] nfs_do_root_mount+0x69/0x90 [nfs]
[83664.202534] [<f83849bf>] nfs4_try_mount+0x3f/0xb0 [nfs]
[83664.202534] [<f8384ca1>] ? nfs_alloc_parsed_mount_data+0x41/0xa0 [nfs]
[83664.202534] [<f8384d50>] nfs4_get_sb+0x50/0xd0 [nfs]
[83664.202534] [<c0231629>] ? alloc_vfsmnt+0xf9/0x130
[83664.202534] [<c021b354>] vfs_kern_mount+0x74/0x1c0
[83664.202534] [<c022f9b3>] ? get_fs_type+0x33/0xb0
[83664.202534] [<c021b4fe>] do_kern_mount+0x3e/0xe0
[83664.202534] [<c0232b2c>] do_mount+0x1dc/0x220
[83664.202534] [<c0232bdb>] sys_mount+0x6b/0xa0
[83664.202534] [<c05c9cc4>] syscall_call+0x7/0xb
[83664.202534] Sched Debug Version: v0.09, 2.6.35-24-generic #42-Ubuntu
[83664.202534] now at 83664203.297770 msecs
[83664.202534] .jiffies : 20841050
[83664.202534] .sysctl_sched_latency : 12.000000
[83664.202534] .sysctl_sched_min_granularity : 4.000000
[83664.202534] .sysctl_sched_wakeup_granularity : 2.000000
[83664.202534] .sysctl_sched_child_runs_first : 0.000000
[83664.202534] .sysctl_sched_features : 15471
[83664.202534] .sysctl_sched_tunable_scaling : 1 (logaritmic)
[83664.202534]
[83664.202534] cpu#0, 2104.451 MHz
[83664.202534] .nr_running : 1
[83664.202534] .load : 1024
[83664.202534] .nr_switches : 90454305
[83664.202534] .nr_load_updates : 12864728
[83664.202534] .nr_uninterruptible : 1
[83664.202534] .next_balance : 20.841083
[83664.202534] .curr->pid : 9
[83664.202534] .clock : 83664203.054506
[83664.202534] .cpu_load[0] : 0
[83664.202534] .cpu_load[1] : 721
[83664.202534] .cpu_load[2] : 720
[83664.202534] .cpu_load[3] : 539
[83664.202534] .cpu_load[4] : 374
[83664.202534] .yld_count : 333029
[83664.202534] .sched_switch : 0
[83664.202534] .sched_count : 93158766
[83664.202534] .sched_goidle : 34377602
[83664.202534] .avg_idle : 1000000
[83664.202534] .ttwu_count : 51632626
[83664.202534] .ttwu_local : 35602671
[83664.202534] .bkl_count : 277652
[83664.202534]
[83664.202534] cfs_rq[0]:/
[83664.202534] .exec_clock : 28648774.386039
[83664.202534] .MIN_vruntime : 0.000001
[83664.202534] .min_vruntime : 84508846.663281
[83664.202534] .max_vruntime : 0.000001
[83664.202534] .spread : 0.000000
[83664.202534] .spread0 : 0.000000
[83664.202534] .nr_running : 1
[83664.202534] .load : 1024
[83664.202534] .nr_spread_over : 168463
[83664.202534] .shares : 0
[83664.202534]
[83664.202534] rt_rq[0]:/
[83664.202534] .rt_nr_running : 0
[83664.202534] .rt_throttled : 0
[83664.202534] .rt_time : 0.000000
[83664.202534] .rt_runtime : 900.000000
[83664.202534]
[83664.202534] runnable tasks:
[83664.202534] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[83664.202534] ----------------------------------------------------------------------------------------------------------
[83664.202534] R events/0 9 84508840.663281 1513547 120 84508840.663281 24835.299583 83493949.900215 /
[83664.202534]
[83664.202534] cpu#1, 2104.451 MHz
[83664.202534] .nr_running : 1
[83664.202534] .load : 1024
[83664.202534] .nr_switches : 90730537
[83664.202534] .nr_load_updates : 12668068
[83664.202534] .nr_uninterruptible : 1
[83664.202534] .next_balance : 20.841046
[83664.202534] .curr->pid : 20262
[83664.202534] .clock : 83664144.033645
[83664.202534] .cpu_load[0] : 0
[83664.202534] .cpu_load[1] : 512
[83664.202534] .cpu_load[2] : 768
[83664.202534] .cpu_load[3] : 896
[83664.202534] .cpu_load[4] : 960
[83664.202534] .yld_count : 376091
[83664.202534] .sched_switch : 0
[83664.202534] .sched_count : 93423215
[83664.202534] .sched_goidle : 33055762
[83664.202534] .avg_idle : 481502
[83664.202534] .ttwu_count : 52337101
[83664.202534] .ttwu_local : 39223807
[83664.202534] .bkl_count : 287227
[83664.202534]
[83664.202534] cfs_rq[1]:/
[83664.202534] .exec_clock : 29027334.434850
[83664.202534] .MIN_vruntime : 0.000001
[83664.202534] .min_vruntime : 77328668.881570
[83664.202534] .max_vruntime : 0.000001
[83664.202534] .spread : 0.000000
[83664.202534] .spread0 : -7180177.781711
[83664.202534] .nr_running : 1
[83664.202534] .load : 1024
[83664.202534] .nr_spread_over : 155060
[83664.202534] .shares : 0
[83664.202534]
[83664.202534] rt_rq[1]:/
[83664.202534] .rt_nr_running : 0
[83664.202534] .rt_throttled : 0
[83664.202534] .rt_time : 0.000000
[83664.202534] .rt_runtime : 1000.000000
[83664.202534]
[83664.202534] runnable tasks:
[83664.202534] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[83664.202534] ----------------------------------------------------------------------------------------------------------
[83664.202534] R bash 20262 77328663.096743 180 120 77328663.096743 96.589960 22615.112807 /
[83664.202534]

The mount command on the client:

brian@pc:~$ sudo mount -t nfs4 -o sec=krb5 linux:/usr/local /mnt/test/

And the export entry on the server:

/usr/local pc(sec=krb5,rw,no_root_squash,sync,subtree_check) \
                pvr(sync,subtree_check)

The client doing the mount (i.e. above command) is "pc".

As I say, I can quite easily resolve this problem by simply going back to the -22 kernel.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-27-generic 2.6.32-27.49
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-27.49-generic 2.6.32.26+drm33.12
Uname: Linux 2.6.32-27-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: i386
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer', '/dev/sequencer2', '/dev/sequencer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe020000 irq 16'
   Mixer name : 'Realtek ALC883'
   Components : 'HDA:10ec0883,10438232,00100002'
   Controls : 37
   Simple ctrls : 20
Date: Thu Jan 13 09:35:05 2011
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: System manufacturer System Product Name
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-27-generic root=/dev/mapper/rootvol-ubuntu_root ro console=ttyS0,115200 console=tty0
ProcEnviron:
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.1
RfKill:

SourcePackage: linux
StagingDrivers: echo
Title: [STAGING]
dmi.bios.date: 03/28/2008
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: ASUS M2A-VM ACPI BIOS Revision 1705
dmi.board.name: M2A-VM
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: 1.XX
dmi.chassis.asset.tag: 123456789000
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvrASUSM2A-VMACPIBIOSRevision1705:bd03/28/2008:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM2A-VM:rvr1.XX:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :
Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

Is there any reason this bug has not even been triaged yet?

This also happens in the Maverick kernel, and I'm willing to bet it happens in the Natty kernel as well.

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

I have also reproduced this with 2.6.38-020638-generic.

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Changed in linux (Ubuntu Lucid):
milestone: none → lucid-updates
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

OK. So, I've bisected all of the mainline kernels between 2.6.32-02063211-generic (last known working release) and 2.6.32-0206322612-generic (first known broken release) and wouldn't you know it, the release in which this was broken was 2.6.32-02063212-generic, a.k.a. v2.6.32.12-lucid, a.k.a. linux-image-2.6.32-02063212-generic_2.6.32-02063212_i386.deb.

So, now that I've spent an afternoon going to all of this effort, is there any chance that somebody with knowledge of what changed between .11 and .12 can spend a few minutes to identify what went into .12 that would have caused this breakage?

Revision history for this message
Brian J. Murrell (brian-interlinx) wrote :

With this bug "Confirmed", is anyone going to actually fix it or will I be stuck on an ancient kernel forever on my NFSv4 server?

Revision history for this message
penalvch (penalvch) wrote :

Brian J. Murrell, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-2302
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Lucid):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.