[UBUNTU 22.04] s390/qeth: cache link_info for ethtool

Bug #1984103 reported by bugproxy
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Jammy
Fix Released
High
Joseph Salisbury
Kinetic
Fix Released
High
Joseph Salisbury

Bug Description

== SRU Justification ==
Since commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline") there was a timing window during recovery, that qeth_query_card_info could be sent to the card, even before it was ready for it, leading to a failing card recovery. There is evidence that this window was hit, as not all callers of get_link_ksettings() check for netif_device_present.

This patch (Mainline commit 7a07a29e4f67) fixes the regression caused by commit e6e771b3d897.

Commit 7a07a29e4f67 is in mainline as of v6.0-rc1.

This patch is being requested in Jammy 5.15 and Kinetic 5.19.

== Fix ==
7a07a29e4f67 ("s390/qeth: cache link_info for ethtool")

== Regression Potential ==
Low. This patch has been accepted in upstream stable and is limited to
the s390/qeth card.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Comment: #0 - J?rn Siglen <email address hidden> - 2022-08-09 07:38:27 ==
+++ This bug was initially created as a clone of Bug #199319 +++

Description: s390/qeth: cache link_info for ethtool
Symptom: lost of IP connection and log entries in journalctl:
                  kernel: qeth 0.0.0365: The qeth device driver failed to
                  recover an error on the device
Problem: Since commit e6e771b3d897
                 ("s390/qeth: detach netdevice while card is offline")
               there was a timing window during recovery, that
               qeth_query_card_info could be sent to the card, even before it
               was ready for it, leading to a failing card recovery. There is
               evidence that this window was hit, as not all callers of
               get_link_ksettings() check for netif_device_present.
Solution: Use cached values in qeth_get_link_ksettings(), instead of
               calling qeth_query_card_info() and falling back to default
               values in case it fails. Link info is already updated when the
               card goes online, e.g. after STARTLAN (physical link up). Set
               the link info to default values, when the card goes offline or
               at STOPLAN (physical link down). A follow-on patch will improve
               values reported for link down.
               Fixes: e6e771b3d897
               ("s390/qeth: detach netdevice while card is offline")
Reproduction: enforce a eth device recvoery, while running ethtool multiple
               times in parallel and using iperf to get load on the interface.
Upstream-ID: 7a07a29e4f6713b224f3bcde5f835e777301bdb8

https://<email address hidden>/T/#m2e3799a38d1d4630822db50f9a5d9b2ca018252f

applicable for most kernel > 3.14

== Comment: #3 - J?rn Siglen <email address hidden> - 2022-08-09 07:54:41 ==
the inital update came in with kernel 5.1 upstream, but we found it backported in many older kernel versions

== Comment: #4 - J?rn Siglen <email address hidden> - 2022-08-09 08:03:09 ==
the acceptance info of the patch can be found here:
https://<email address hidden>/T/#m2e3799a38d1d4630822db50f9a5d9b2ca018252f

CVE References

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-199325 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → kernel-package (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in kernel-package (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Staudt (tstaudt2) wrote :

Hello Frank,

I mirrored this while Boris is on vacation for your awareness - please adjust accordingly.
Thanks

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-08-10 04:10 EDT-------
clarification from development on the overall situation:

Symptoms:
A) Since
02d5cb5bb20b ("qeth: Accurate ethtool output")
query_card_info is called for each qeth_get_link_ksettings
Frequent calls to ethtool?s get_link_ksetting can put pressure on the zVM VSwitch
that lead to timeouts in the qeth driver that responds with recovering the affected interface.

B) Since
e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")
there is a window condition that query_card_info can interfere with recovery and
lead to failing recovery.

7a07a29e4f67 ("s390/qeth: cache link_info for ethtool") solves both problems.

Backport considerations:
A) 02d5cb5bb20b ("qeth: Accurate ethtool output")
went into kernel v3.14 and was backported to RHEL7.9

B) e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")
went into kernel v5.1 and was backported to RHEL8.2

Frank Heimes (fheimes)
affects: kernel-package (Ubuntu) → ethtool (Ubuntu)
Changed in ubuntu-z-systems:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in ethtool (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Changed in ethtool (Ubuntu Jammy):
status: New → Confirmed
Frank Heimes (fheimes)
Changed in ethtool (Ubuntu Kinetic):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in ethtool (Ubuntu Jammy):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

canonical-kernel-team does not monitor or respond to userpsace ethtool package bugs; this bug should have gone to the server team triage.

Changed in ethtool (Ubuntu Jammy):
assignee: Canonical Kernel Team (canonical-kernel-team) → Ubuntu Server (ubuntu-server)
Changed in ethtool (Ubuntu Kinetic):
assignee: Canonical Kernel Team (canonical-kernel-team) → Ubuntu Server (ubuntu-server)
tags: added: rls-incoming-jj rls-incoming-kk
Frank Heimes (fheimes)
no longer affects: ethtool (Ubuntu)
Changed in ethtool (Ubuntu Jammy):
assignee: Ubuntu Server (ubuntu-server) → nobody
Changed in ethtool (Ubuntu Kinetic):
assignee: Ubuntu Server (ubuntu-server) → nobody
no longer affects: ethtool (Ubuntu Kinetic)
no longer affects: ethtool (Ubuntu Jammy)
Changed in linux (Ubuntu Jammy):
status: New → Confirmed
Changed in linux (Ubuntu Kinetic):
status: New → Confirmed
tags: removed: rls-incoming-jj rls-incoming-kk
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-08-30 08:13 EDT-------
(In reply to comment #12)
> canonical-kernel-team does not monitor or respond to userpsace ethtool
> package bugs; this bug should have gone to the server team triage.

This is a _kernel_ patch that fixes output used by ethtool.

Revision history for this message
Frank Heimes (fheimes) wrote :

Yepp, it's all about kernel patches.
This bug was wrongly marked as affecting the ethtool (well, it affects the ethtool, but not in the 'affects' send of Launchpad), which caused some confusion.
I just changed that, pointing now to the kernel (which is "linux (Ubuntu)" in the above affects section).

Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

I prepared a test kernel with the proposed patch (that was a clear cherry pick for 5.15). Frank do you think you can help me validating the test kernel?

You can find a tarball with the debian packages for the test kernel at:

https://kernel.ubuntu.com/~mhcerri/test/lp1984103/linux-unsigned-debs-5.15.0-48.54+lp1984103.1_s390x.tgz

tags: added: patch
bugproxy (bugproxy)
tags: added: targetmilestone-inin2204
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-08-31 12:47 EDT-------
new test kernel works as expected:

Linux version 5.15.0-47-generic

# grep qcrdinfo /5.15.0-48-genericsys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
7
# ethtool encbd00 | grep Duplex
Duplex: Full
# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
9

afterupdate to Linux version 5.15.0-48-generic

# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
0
# ethtool encbd00 | grep Duplex
Duplex: Full
# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
0
# ethtool encbd00
Settings for encbd00:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseSR/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10000baseSR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: on
Port: FIBRE
PHYAD: 0
Transceiver: internal
Link detected: yes

Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Changed in linux (Ubuntu Kinetic):
importance: Undecided → High
Changed in linux (Ubuntu Jammy):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Kinetic):
assignee: nobody → Joseph Salisbury (jsalisbury)
description: updated
Changed in linux (Ubuntu Jammy):
status: Confirmed → In Progress
Changed in linux (Ubuntu Kinetic):
status: Confirmed → In Progress
description: updated
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Confirmed → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Kinetic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-50.56 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Frank Heimes (fheimes) wrote :

Verification done on jammy (on a slightly different setup):
root@s1lp15:~# uname -a
Linux s1lp15 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 04:55:23 UTC 2022 s390x s390x s390x GNU/Linux
root@s1lp15:~# ethtool encc000 | grep Duplex
 Duplex: Full
root@s1lp15:~# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
3
root@s1lp15:~# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii
/sys/kernel/debug/s390dbf/qeth_card_0.0.c000/hex_ascii:00 01664021373:134002 2 - 0001 000003ff81066676 71 63 72 64 69 6e 66 6f | qcrdinfo
/sys/kernel/debug/s390dbf/qeth_card_0.0.c000/hex_ascii:00 01664021384:308308 2 - 0007 000003ff81066676 71 63 72 64 69 6e 66 6f | qcrdinfo
/sys/kernel/debug/s390dbf/qeth_card_0.0.c000/hex_ascii:00 01664021384:308804 2 - 0007 000003ff81066676 71 63 72 64 69 6e 66 6f | qcrdinfo
root@s1lp15:~# ethtool encc000
Settings for encc000:
 Supported ports: [ FIBRE ]
 Supported link modes: 10000baseSR/Full
 Supported pause frame use: No
 Supports auto-negotiation: Yes
 Supported FEC modes: Not reported
 Advertised link modes: 10000baseSR/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Advertised FEC modes: Not reported
 Speed: 10000Mb/s
 Duplex: Full
 Auto-negotiation: on
 Port: FIBRE
 PHYAD: 0
 Transceiver: internal
 Link detected: yes
(adjusting tags accordingly)

tags: added: verification-done verification-done-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.19.0-18.18

---------------
linux (5.19.0-18.18) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-18.18 -proposed tracker (LP: #1990366)

  * 5.19.0-17.17: kernel NULL pointer dereference, address: 0000000000000084
    (LP: #1990236)
    - Revert "UBUNTU: SAUCE: apparmor: Fix regression in stacking due to label
      flags"
    - Revert "UBUNTU: [Config] disable SECURITY_APPARMOR_RESTRICT_USERNS"
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - add an internal buffer""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't wait on cleanup""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't waste entropy""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - always add a pending
      request""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - unregister device before
      reset""
    - Revert "UBUNTU: SAUCE: Revert "virtio-rng: make device ready before making
      request""
    - Revert "UBUNTU: [Config] update configs after apply new apparmor patch set"
    - Revert "UBUNTU: SAUCE: apparmor: add user namespace creation mediation"
    - Revert "UBUNTU: SAUCE: selinux: Implement userns_create hook"
    - Revert "UBUNTU: SAUCE: bpf-lsm: Make bpf_lsm_userns_create() sleepable"
    - Revert "UBUNTU: SAUCE: security, lsm: Introduce security_create_user_ns()"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: AppArmor: Remove the exclusive
      flag"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add /proc attr entry for full
      LSM context"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Removed scaffolding function
      lsmcontext_init"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: netlabel: Use a struct lsmblob in
      audit data"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple
      object contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: audit: multiple subject lsm values
      for netlabel"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple task
      security contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Allow multiple records in an
      audit_buffer"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add a function to report
      multiple LSMs"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Create audit_stamp
      structure"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Keep multiple LSM data in
      audit_names"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx
      module selection"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: binder: Pass LSM identifier for
      confirmation"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: NET: Store LSM netlabel data in a
      lsmblob"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx in
      netlink netfilter"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_dentry_init_security"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_inode_getsecctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_secid_to_secctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM:...

Changed in linux (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (42.9 KiB)

This bug was fixed in the package linux - 5.15.0-50.56

---------------
linux (5.15.0-50.56) jammy; urgency=medium

  * jammy/linux: 5.15.0-50.56 -proposed tracker (LP: #1990148)

  * CVE-2022-3176
    - io_uring: refactor poll update
    - io_uring: move common poll bits
    - io_uring: kill poll linking optimisation
    - io_uring: inline io_poll_complete
    - io_uring: correct fill events helpers types
    - io_uring: clean cqe filling functions
    - io_uring: poll rework
    - io_uring: remove poll entry from list when canceling all
    - io_uring: bump poll refs to full 31-bits
    - io_uring: fail links when poll fails
    - io_uring: fix wrong arm_poll error handling
    - io_uring: fix UAF due to missing POLLFREE handling

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809)
    - selftests/net: test nexthop without gw

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809) // icmp_redirect.sh in ubuntu_kernel_selftests failed on
    Jammy 5.15.0-49.55 (LP: #1990124)
    - ip: fix triggering of 'icmp redirect'

linux (5.15.0-49.55) jammy; urgency=medium

  * jammy/linux: 5.15.0-49.55 -proposed tracker (LP: #1989785)

  * amdgpu module crash after 5.15 kernel update (LP: #1981883)
    - drm/amdgpu: fix check in fbdev init

  * scsi: hisi_sas: Increase debugfs_dump_index after dump is  completed
    (LP: #1982070)
    - scsi: hisi_sas: Increase debugfs_dump_index after dump is completed

  * [UBUNTU 22.04] s390/qeth: cache link_info for ethtool (LP: #1984103)
    - s390/qeth: cache link_info for ethtool

  * WARN in trace_event_dyn_put_ref (LP: #1987232)
    - tracing/perf: Fix double put of trace event when init fails

  * Jammy update: v5.15.60 upstream stable release (LP: #1989221)
    - x86/speculation: Make all RETbleed mitigations 64-bit only
    - selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
    - selftests/bpf: Check dst_port only on the client socket
    - block: fix default IO priority handling again
    - tools/vm/slabinfo: Handle files in debugfs
    - ACPI: video: Force backlight native for some TongFang devices
    - ACPI: video: Shortening quirk list by identifying Clevo by board_name only
    - ACPI: APEI: Better fix to avoid spamming the console with old error logs
    - crypto: arm64/poly1305 - fix a read out-of-bound
    - KVM: x86: do not report a vCPU as preempted outside instruction boundaries
    - KVM: x86: do not set st->preempted when going back to user space
    - KVM: selftests: Make hyperv_clock selftest more stable
    - tools/kvm_stat: fix display of error when multiple processes are found
    - selftests: KVM: Handle compiler optimizations in ucall
    - KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
    - arm64: set UXN on swapper page tables
    - btrfs: zoned: prevent allocation from previous data relocation BG
    - btrfs: zoned: fix critical section of relocation inode writeback
    - Bluetooth: hci_bcm: Add BCM4349B1 variant
    - Bluetooth: hci_bcm: Add DT compatible for CYW55572
    - dt-bindings: bluetooth: broadcom: Add BCM4349B1 DT binding
    - Bluetooth: btusb: Add support of IMC Netw...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1005.7~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Frank Heimes (fheimes) wrote :

This bug was not requested for linux-gkeop-5.15 nor for focal.
Hence I'll just set the tag 'verification-done-focal' to unblock the SRU process.

tags: added: verification-done-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1010.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
removed: verification-done verification-done-jammy
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug was not requested for linux-bluefield.
Hence I'll just set the tag 'verification-done-jammy' to unblock the further process.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia/5.15.0-1011.11 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug was not opened against linux-nvidia/5.15.0-1011.11 and is also not relevant for this kernel.
However, I'm setting the tag to done to unblock the process.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-12-12 08:36 EDT-------
still fine with latest updates:

# uname -a
Linux a83lp78.lnxne.boe 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:57:26 UTC 2022 s390x s390x s390x GNU/Linux
# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
0
# ethtool encbd00 | grep Duplex
Duplex: Full
# grep qcrdinfo /sys/kernel/debug/s390dbf/qeth*/hex_ascii | wc -l
0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.