Broadcom BCM5906M ethernet adapter (tg3) hangs under heavy tcp load (bittorrent) when generic segmentation offload (gso) is in use
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Debian |
New
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Won't Fix
|
Undecided
|
Michał Markowski |
Bug Description
kosh@galileo:~$ dpkg -s linux-generic | grep -E -e ^Version
Version: 2.6.31.4.15
kosh@galileo:~$ lsb_release -rd
Description: Ubuntu karmic (development branch)
Release: 9.10
02:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02)
Subsystem: Lenovo Device 3a23
Flags: bus master, fast devsel, latency 0, IRQ 27
Memory at f0200000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [48] Power Management version 3
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Capabilities: [d0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [13c] Virtual Channel <?>
Capabilities: [160] Device Serial Number 6d-06-4a-
Kernel driver in use: tg3
Kernel modules: tg3
When intense concurrent TCP traffic occurs (I use bittorrent for testing), the network adapter just hangs after a few seconds (no longer receives any ethernet packets).
“ip link set eth0 down” followed by “ip link set eth0 up” corrects the situation. However, after a few seconds of bittorrent traffic, the adapter hangs again. The total traffic (in+out) was below 2 MiB/s when this occured.
The problem does not occur when instead of bittorrent traffic a single “high” data rate TCP stream is used (7 MiB/s inbound).
After deactivating generic receive offload via “sudo ethtool -K eth0 gso off”, the problem does not occur with any traffic pattern I tried.
On an older kernel version I apparently waited long enough for the kernel to notice and correct the problem, I found this in an old kernel message log:
Jul 15 21:56:48 galileo kernel: [12275.000139] ------------[ cut here ]------------
Jul 15 21:56:48 galileo kernel: [12275.000174] WARNING: at /build/
Jul 15 21:56:48 galileo kernel: [12275.000187] Hardware name: 40684JG
Jul 15 21:56:48 galileo kernel: [12275.000198] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
Jul 15 21:56:48 galileo kernel: [12275.000207] Modules linked in: ppdev bridge stp bnep lm75 i2c_i801 joydev lp parport snd_hda_
Jul 15 21:56:49 galileo kernel: [12275.000409] Pid: 0, comm: swapper Tainted: P 2.6.31-2-generic #17-Ubuntu
Jul 15 21:56:49 galileo kernel: [12275.000420] Call Trace:
Jul 15 21:56:49 galileo kernel: [12275.000442] [<c013fd8d>] warn_slowpath_
Jul 15 21:56:49 galileo kernel: [12275.000460] [<c04b0bdb>] ? dev_watchdog+
Jul 15 21:56:49 galileo kernel: [12275.000476] [<c04b0bdb>] ? dev_watchdog+
Jul 15 21:56:49 galileo kernel: [12275.000493] [<c013fe06>] warn_slowpath_
Jul 15 21:56:49 galileo kernel: [12275.000509] [<c04b0bdb>] dev_watchdog+
Jul 15 21:56:49 galileo kernel: [12275.000527] [<c015239b>] ? insert_
Jul 15 21:56:49 galileo kernel: [12275.000544] [<c0124678>] ? default_
Jul 15 21:56:49 galileo kernel: [12275.000562] [<c0576d7a>] ? _spin_lock_
Jul 15 21:56:49 galileo kernel: [12275.000578] [<c0152731>] ? __queue_
Jul 15 21:56:49 galileo kernel: [12275.000595] [<c014a9a7>] run_timer_
Jul 15 21:56:49 galileo kernel: [12275.000613] [<c016406a>] ? tick_handle_
Jul 15 21:56:49 galileo kernel: [12275.000630] [<c04b09f0>] ? dev_watchdog+
Jul 15 21:56:49 galileo kernel: [12275.000646] [<c0145b70>] __do_softirq+
Jul 15 21:56:49 galileo kernel: [12275.000663] [<c0189f8c>] ? handle_
Jul 15 21:56:49 galileo kernel: [12275.000679] [<c018cbb4>] ? move_native_
Jul 15 21:56:49 galileo kernel: [12275.000695] [<c0145cbd>] do_softirq+
Jul 15 21:56:49 galileo kernel: [12275.000709] [<c0145dfd>] irq_exit+0x5d/0x70
Jul 15 21:56:49 galileo kernel: [12275.000726] [<c0104e40>] do_IRQ+0x50/0xc0
Jul 15 21:56:49 galileo kernel: [12275.000741] [<c01039d0>] common_
Jul 15 21:56:49 galileo kernel: [12275.000761] [<c0366db8>] ? acpi_idle_
Jul 15 21:56:49 galileo kernel: [12275.000780] [<c04689d6>] cpuidle_
Jul 15 21:56:49 galileo kernel: [12275.000796] [<c0102034>] cpu_idle+0x94/0xd0
Jul 15 21:56:49 galileo kernel: [12275.000814] [<c0565ab5>] rest_init+0x55/0x60
Jul 15 21:56:49 galileo kernel: [12275.000833] [<c07958d8>] start_kernel+
Jul 15 21:56:49 galileo kernel: [12275.000850] [<c0795406>] ? unknown_
Jul 15 21:56:49 galileo kernel: [12275.000867] [<c079507c>] __init_
Jul 15 21:56:49 galileo kernel: [12275.000878] ---[ end trace 4c50a9017b4e6444 ]---
Jul 15 21:56:49 galileo kernel: [12275.000888] tg3: eth0: transmit timed out, resetting
Jul 15 21:56:49 galileo kernel: [12275.000942] tg3: DEBUG: MAC_TX_
Jul 15 21:56:49 galileo kernel: [12275.000961] tg3: DEBUG: RDMAC_STATUS[
Jul 15 21:56:49 galileo kernel: [12275.139673] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
Jul 15 21:56:49 galileo kernel: [12275.278414] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Jul 15 21:56:49 galileo kernel: [12275.416830] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
Jul 15 21:56:49 galileo kernel: [12275.554865] tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Jul 15 21:56:49 galileo kernel: [12275.565295] tg3: eth0: Link is down.
Jul 15 21:56:51 galileo kernel: [12277.271978] tg3: eth0: Link is up at 100 Mbps, full duplex.
Jul 15 21:56:51 galileo kernel: [12277.272000] tg3: eth0: Flow control is on for TX and on for RX.
Jul 15 22:11:34 galileo kernel: [13160.608756] tg3 0000:02:00.0: PME# disabled
Jul 15 22:11:34 galileo kernel: [13160.624516] tg3 0000:02:00.0: irq 27 for MSI/MSI-X
Jul 15 22:11:34 galileo kernel: [13160.682035] ADDRCONF(
Jul 15 22:11:36 galileo kernel: [13162.310396] tg3: eth0: Link is up at 100 Mbps, full duplex.
Jul 15 22:11:36 galileo kernel: [13162.310413] tg3: eth0: Flow control is on for TX and on for RX.
This system is a Lenovo S10e netbook.
Changed in linux (Ubuntu): | |
assignee: | nobody → Michał Markowski (markowski) |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
tags: | added: b73a1py79 |
The problem still exists in
ii linux-image-generic 2.6.31.14.27 Generic Linux kernel image
gso is on by default so this should be fixed, switching off gso on affected hardware would be enough.
I now had the time to test whether the driver recovers automatically, and indeed it does, as the attached log shows.