xhci_hcd: TRB DMA errors reported with ASMedia ASM1142 USB 3.1 Controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Confirmed
|
High
|
|||
linux (Debian) |
Confirmed
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
In Progress
|
Medium
|
Unassigned | ||
Trusty |
Won't Fix
|
Medium
|
Unassigned | ||
Xenial |
Confirmed
|
Medium
|
Unassigned | ||
Bionic |
Confirmed
|
Medium
|
Unassigned | ||
Focal |
Confirmed
|
Medium
|
Unassigned |
Bug Description
It was observed that while trying to use a 4K USB webcam connected to USB port provided by ASMedia ASM1142 USB 3.1 Controller, the webcam does not work and kernel log shows the following messages:
[431.928016] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928021] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e020 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928024] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928026] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e030 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928027] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[431.928029] xhci_hcd 0000:12:00.0: Looking for event-dma 0000003f3330e050 trb-start 0000003f3330e000 trb-end 0000003f3330e000 seg-start 0000003f3330e000 seg-end 0000003f3330eff0
[431.928386] xhci_hcd 0000:12:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
A similar issue was already reported on Launchpad: https:/
The fix to this issue seems to be the following patch: https:/
Tests in our scenario with this patch proved still broken. Our next approach is to modify the patch a bit and re-test.
This LP will be used to document our progress in the investigation.
no longer affects: | linux-meta (Ubuntu) |
description: | updated |
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #1 |
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Changed in linux (Ubuntu): | |
status: | Incomplete → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Trusty): | |
importance: | Undecided → Medium |
status: | New → In Progress |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Xenial): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux (Ubuntu Artful): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
importance: | Undecided → Medium |
status: | New → In Progress |
Guilherme G. Piccoli (gpiccoli) wrote : | #2 |
Patch was modified (by adding the PCI_ID of device 1142A, which confusingly is 1242!) and still the problem reproduces.
New approaches to be tried soon.
tags: | added: kernel-da-key |
imperia (imperia777) wrote : | #3 |
Hello,
Looks like I am having the same problem.
After some hours(random time) my USB 3.1 asmedia controller crashes the driver with following error:
[ 873.661534] xhci_hcd 0000:00:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 3
[ 873.661629] xhci_hcd 0000:00:00.0: Looking for event-dma 00000002722ed630 trb-start 00000002722ed9b0 trb-end 00000002722ed9d0 seg-start 00000002722ed000 seg-end 00000002722edff0
[ 875.673409] xhci_hcd 0000:00:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
I am struggling with this error for more than year. Its very annoying to have to restart the PC every few hours. USB tuner card is connected to the port.
I would like to provide whatever information and support is necessary to fix this damn bug. Logs, ssh access to the affected box and everything else what is needed.
Please ask me here or write to my e mail imperia777_
Thanks.
Guilherme G. Piccoli (gpiccoli) wrote : | #4 |
Nice imperia, thanks for the report here. First we need to be sure it's exactly the same adapter.
Can you provide the the output of "lspci -nn" ?
Then, if it's the same adapter:
0) Which Ubuntu version are you running? Which kernel version are you using? Can you try in the latest 4.13 for Xenial? (or even better, the hwe-edge 4.15)
Instructions to run the latest 4.15 version: https:/
1) You said "after some hours" - can you provide some details? You've been using the USB tuner for like 2 hours? 12 hours? The tuner is in constant use and suddenly the issue happens?
2) If possible, enable xhci dynamic debug and provide logs after the issue; in order to do this, run the following command as root:
echo "module xhci_hcd +flpt" > /sys/kernel/
After issue reproduces, collect the /var/log/kern.log file.
Thanks,
Guilherme
imperia (imperia777) wrote : | #5 |
Hello,
00:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
Actually I am on debian buster. I am running kernel 4.16-rc6 from experimental repository.
I am running program for watching satellite channels called vdr.
When I am not watching TV, while idle, every few minutes vdr scans for channel list updates from satellites. It is safe to say that tuner is occupied every few minutes for a scan, but not occupied with bandwidth like when watching TV. While in this mode vdr is able to crash the driver in ~6-30 hours.
There is program that you use to initially create your channels list for vdr. When I use it I am able to crash the driver in ~1-2 hours.
But when I just watch one channel and don't change it for hours, driver is least likely to crash.
I think something in consecutive opening (initializing) of the usb port/driver forces this error.
Because the program that scans for channels crash it much faster.
This program work like this:
:go
open port
scan some frequency
write to file new channels
close port
goto go
I made this script that I will use to capture the log.
echo "module xhci_hcd +flpt" > /sys/kernel/
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/
And I will run initial channels list scan to force it faster.
I will be back later with the logs.
Thanks for your help.
imperia (imperia777) wrote : | #6 |
Mar 29 20:20:03 vdr kernel: [119370.230528] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c590 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230533] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230537] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230542] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230547] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5a0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230553] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000041e92668 (0x2ae36c5a0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230558] <intr> xhci_ring_
Mar 29 20:20:03 vdr kernel: [119370.230631] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 0000000060641c50, dev 2, ep 0x82, starting at offset 0x2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230638] [27868] xhci_ring_
Mar 29 20:20:03 vdr kernel: [119370.230650] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5a0
Mar 29 20:20:03 vdr kernel: [119370.230700] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5a0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230705] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230710] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:20:03 vdr kernel: [119370.230715] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue segment = 00000000573583cc (virtual)
Mar 29 20:20:03 vdr kernel: [119370.230719] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: New dequeue pointer = 0x2ae36c5b0 (DMA)
Mar 29 20:20:03 vdr kernel: [119370.230725] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Set TR Deq Ptr cmd, new deq seg = 00000000573583cc (0x2ae36c000 dma), new deq ptr = 0000000050070757 (0x2ae36c5b0 dma), new cycle = 0
Mar 29 20:20:03 vdr kernel: [119370.230730] <intr> xhci_ring_
Mar 29 20:20:03 vdr kernel: [119370.230798] [27868] xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cancel URB 00000000588cca08, dev 2, ep 0x82, starting at offset 0x2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230805] [27868] xhci_ring_
Mar 29 20:20:03 vdr kernel: [119370.230816] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Successful Set TR Deq Ptr cmd, deq = @2ae36c5b0
Mar 29 20:20:03 vdr kernel: [119370.230865] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Removing canceled TD starting at 0x2ae36c5b0 (dma).
Mar 29 20:20:03 vdr kernel: [119370.230870] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Finding endpoint context
Mar 29 20:20:03 vdr kernel: [119370.230874] <intr> xhci_dbg_trace:31: xhci_hcd 0000:00:00.0: Cycle state = 0x0
Mar 29 20:...
Guilherme G. Piccoli (gpiccoli) wrote : | #7 |
Thanks a lot Imperia! It's indeed the same PCI adapter, and it's even better you're running an upstream kernel like this.
I'll analyze your logs in order to match with the ones I have here.
I might need some xhci traces to understand the TRBs operations (like the enqueue and completion of TRBs). I'll comment here in case I need it.
Cheers,
Guilherme
imperia (imperia777) wrote : | #8 |
echo xhci-hcd >> /sys/kernel/
(tail -F -n0 /var/log/kern.log &) | grep -q "TRB DMA"
cp /var/log/kern.log /home/imperia/
Is this correct command to get traces?
I will run it in advance.
Somebody told me to run this before when I was looking for help.
BTW did you download the full logs so I can remove it from web page?
I will can provide ssh access to box affected if needed.
Guilherme G. Piccoli (gpiccoli) wrote : | #9 |
Wow Imperia, you're being really helpful here, thank you very much!
To enable traces, these are the instructions I've provided to other people affected so far:
0) Reboot the machine in order to put it in a consistent state;
1) echo "module xhci_hcd +flpt" > /sys/kernel/
2) echo nop > /sys/kernel/
3) echo 81920 > /sys/kernel/
4) echo 0 > /sys/kernel/
5) echo 1 > /sys/kernel/
6) echo 1 > /sys/kernel/
After reproduce the issue, you should collect /sys/kernel/
About the SSH access, I'm interested in getting it next week, if it doesn't annoy you too much. It'll be really helpful, but I might need to reboot the machine.
Oh, I've downloaded the logs from your website, so you can delete it now.
Cheers,
Guilherme
imperia (imperia777) wrote : | #10 |
Hello,
I think I am ready with the trace log. Hopefully it is full, because machine run out of disk space :)
http://
Interesting is that it took ~12 hours to crash it this time.
The problem with ssh access is that this is virtual machine under XEN and when you reboot it, the USB controller is gone(not assigned to virtual machine anymore). I have to re-assign the USB controller for passthrough from xen host. (this is xen bug I think, it wasn't like this before).
This is what I do when I have to restart vdr virtual machine:
xl pci-assignable-
xl pci-assignable-add 03:00.0
xl create /etc/xen/vdr.cfg
Anyway we can get in touch on irc and I can do restarts for you.
BTW, I shutdown the whole xen server. Then I turn off the power button on PSU and pressed the power button on the case to discharge any electricity left and put it in consistent state before getting the trace logs.
Guilherme G. Piccoli (gpiccoli) wrote : | #11 |
Thanks again Imperia, the traces are fine. They're only 25MB, shouldn't have caused any kind of disk issues, like out of space condition. Also, I'd like to see the correlated kernel log to match the problematic TRBs from the kernel log with trace information. Can you provide me the relevant kern.log file?
I've already downloaded the traces from your server, in case you want to remove the file.
About the SSH, thanks for the offering and let's talk on IRC in case I need it. I'll try the logs first, not sure they're enough for me to understand the issue completely.
Cheers,
Guilherme
Guilherme G. Piccoli (gpiccoli) wrote : | #12 |
Hi Imperia, I built a mainline kernel (version 4.16) with a different quirk that I think might help here. Can you test it? Thanks in advance!
Instructions (run all as root):
1) wget people.
2) mv imperia416.tgz /
3) tar -zxf imperia416.tgz
4) update-initramfs -c -k 4.16.0-imperia+
Now that's important: if you have access to some serial console in the machine (or if you have physical access), you can reboot into this new kernel. In case _you only have ssh_, I'd suggest to remove the kernel boot entry from grub, and boot through kexec for safety reasons:
a) Remove boot entries from grub.cfg (you can copy away vmlinuz-
b) apt-get install kexec-tools
c) kexec vmlinuz-
----
After machine (hopefully!) boot to the new kernel, check in dmesg if the quirk is there:
#$ dmesg|grep QUIRK
[0.813486] QUIRK: XHCI_AVOID_BEI
If you can see that output ("QUIRK: XHCI_AVOID_BEI"), then the quirk was applied.
Now, just need to try to reproduce the issue again.
Thanks a lot,
Guilherme
imperia (imperia777) wrote : | #13 |
Hello,
I am unable to test with the kernel you provided, because my tuner card doesn't have driver in mainline kernel tree. So I have to compile it myself and I need kernel headers for this.
So I compiled kernel 4.16 from debian linux-source-4.16 package and applied the patch you provided:
From dd0375ffba55172
From: "Guilherme G. Piccoli" <email address hidden>
Date: Wed, 11 Apr 2018 11:04:13 +0000
Subject: [PATCH] xhci: Add quirk to ASMedia 0x1242 adapter to avoid BEI
Signed-off-by: Guilherme G. Piccoli <email address hidden>
---
drivers/
1 file changed, 6 insertions(+)
diff --git a/drivers/
index d9f831b..0654461 100644
--- a/drivers/
+++ b/drivers/
@@ -213,6 +213,12 @@ static void xhci_pci_
if (pdev->vendor == PCI_VENDOR_
+ pdev->device == 0x1242) {
+ xhci->quirks |= XHCI_AVOID_BEI;
+ pr_warn("QUIRK: XHCI_AVOID_BEI");
+ }
+
+ if (pdev->vendor == PCI_VENDOR_
--
2.7.4
Compiled my tuner card driver now and I am testing.
Andy Whitcroft (apw) wrote : Closing unsupported series nomination. | #14 |
This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu Artful): | |
status: | In Progress → Won't Fix |
imperia (imperia777) wrote : | #15 |
- dmidecode.out Edit (11.7 KiB, text/plain)
this is dmidecode output of my machine, in case the fix is FW related, it may be useful in order to contact the motherboard vendor
Roy Thompson (royt77) wrote : | #16 |
I am running into this same issue with an ASMedia 2142 USB board. Was a fix ever identified?
Guilherme G. Piccoli (gpiccoli) wrote : | #17 |
Hi Roy, thanks for the report. What is your motherboard? What kernel are you running? And what tests are triggering this issue for you?
If you have logs, it'll be pretty useful.
Maybe it's a similar but different case..or the logs may help to confirm it's exact the same issue.
ASMedia seems to have a FW fix but that depends on your motherboard vendor to provide it. They don't provide the fix themselves...it needs some cooking from the vendor, to match subsystem IDs and whatnot.
Cheers,
Guilherme
Changed in linux (Debian): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
status: | New → Confirmed |
Roy Thompson (royt77) wrote : | #18 |
Hi Guilherme,
Thanks for the response. I have several (3) quad port ASMedia 2142 PCIe/USB 3.1 cards installed in a Dell R740 rack server. I am using the standard Ubuntu 18.04 kernel (Linux dell-PowerEdge-R740 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux).
For one of my applications, it runs a loop that opens and closes a high speed connection to a USB device connected through the ASMedia board. After this goes on for several minutes without any issues, I see this in dmesg:
[Oct 5 10:12] xhci_hcd 0000:be:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +3.418076] xhci_hcd 0000:be:00.0: WARN Successful completion on short TX
[ +0.000035] xhci_hcd 0000:be:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 12 comp_code 1
[ +0.000003] xhci_hcd 0000:be:00.0: Looking for event-dma 0000001fe9759610 trb-start 0000001fe9759620 trb-end 0000001fe9759620 seg-start 0000001fe9759000 seg-end 0000001fe9759ff0
This is then followed shortly after by several kernel dump messages, and then the whole system starts behaving erratically, requiring a hard reboot to recover.
The condition is easy for me to reproduce and I will happily provide any logs that may be of use to help debug this. Please just let me know what you would like and how to get them (as I am not a kernel expert).
Thanks,
Roy
Guilherme G. Piccoli (gpiccoli) wrote : | #19 |
Hi Roy, thanks for your quick response. First thing, I'd like to ask you to attach the output of "lspci -vvv" and "dmidecode" in this LP so we can validate the adapters and be sure they are exactly the same, and also the motherboard type. Run both commands as root user.
After that, i'll ask you to reproduce the issue and attach the output of "dmesg" command right after reproduction. If you can also elaborate more about the test you're running, I'd really be glad.
I'll then provide you custom commands to use the kernel trace system to infer more about the issue. One final thing: are you willing to test with mainline kernel in order to check if there's some upstream fix for your instance of the issue?
If so, you can get it here: https:/
This PPA provides a build from kernel 4.18.
Thanks in advance,
Guilherme
Bryan Walsh (yetanotherbryan) wrote : | #20 |
Hello,
I think I am seeing the same or related issue with the ASM1142 controller on my Razer Core Chroma EGPU enclosure. I'm running Ubuntu 19.04, kernel version 5.0.0-13-generic. Ethernet on the enclosure stops working while downloading large files. Dmesg produces the following error messages:
[ 569.641475] xhci_hcd 0000:0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 13
[ 569.641487] xhci_hcd 0000:0f:00.0: Looking for event-dma 000000048d9c5770 trb-start 000000048d9c5750 trb-end 000000048d9c5750 seg-start 000000048d9c5000 seg-end 000000048d9c5ff0
lspci output:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Intel(R) 100 Series Chipset Family LPC Controller/eSPI Controller - 9D4E (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21)
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
09:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
0a:00....
Guilherme G. Piccoli (gpiccoli) wrote : | #21 |
Hi Bryan, thanks for the report. It could be the same issue, can you provide the full dmesg, and also the outputs of the following commands: "lspci -nnvvv", "lspci -t" and "ls -l /sys/class/net"?
The issue was fixed for the first reporter via a FW update in the ASMedia adapter; unfortunately this FW update comes from the vendor, so the way of getting it varies according to the HW presenting the problem.
Cheers,
Guilherme
Bryan Walsh (yetanotherbryan) wrote : | #22 |
- egpu_debug.txt Edit (26.8 KiB, text/plain)
Please see attached log for the outputs that you requested.
Bryan Walsh (yetanotherbryan) wrote : | #23 |
- egpu_debug.txt Edit (26.8 KiB, text/plain)
Please see attached log for the outputs that you requested.
Guilherme G. Piccoli (gpiccoli) wrote : | #24 |
Great Bryan, the model of your USB controller is the same reported in this LP; also, given the outputs you provided, the network interface "enx90203a19dcb6" in under one of those USB controllers - you mentioned you see the TRB DMA errors and the interface stops responding. Is the problematic interface that one, "enx90203a19dcb6" ?
Who is the vendor of your device? I'd suggest you to seek help from them, mentioning this LP and that ASMedia may have a potential firmware fix for the issue.
Thanks,
Guilherme
Gabe Esposito (gabespo) wrote : | #25 |
I'm also experiencing the same issue with the ASM1142 controller on the Core X Chroma and can reproduce consistently. I'm running kernel 5.0.9.
Guilherme, thanks for your work diagnosing this. This device is sold by Razer. I will try and reach out but they do not claim Linux support on any of their devices so I worry this may go unfixed. Barring a firmware fix, is there any hope of this being fixed with a quirk, as the other controller was? I realize this LP is not the ideal place for such a fix to take place, but I am happy to participate in finding a solution.
Bryan Walsh (yetanotherbryan) wrote : | #26 |
In attempt to update the firmware I installed the razer software on my newly created windows partition, to see if it could be updated through there. No luck.
I emailed Razer support to ask about obtaining updated firmware. I'll let everyone know what I hear back.
And yes, "enx90203a19dcb6" is the problematic interface.
Guilherme G. Piccoli (gpiccoli) wrote : | #27 |
Thanks Gabe! I agree with you, would be really nice to have a quirk for that. It would be more easy to analyze that possibility with a datasheet for this adapter, which unfortunately I don't have.
I'm on vacation until next week, I'll try to discuss that in linux-usb when I'm back, and pursue a kernel quirk instead of firmware-only fix.
@Bryan, thanks for checking with the vendor, let us know the outcome.
Cheers,
Guilherme
Alex Lourenco (nyb-2017) wrote : | #28 |
I am experiencing the exact same issue first reported in this LP (ASMedia ASM1142 USB 3.1 Controller with a Logitech Brio 4k, ERROR Transfer event TRB DMA ptr not part of current TD ...). In my case the controller is provided by a StarTech.com 4 Port USB 3.1 PCIe Card 3x USB-A and 1x USB-C [PEXUS313AC2V].
While searching online I found a couple of LP's and forum posts with similar issues. The common factor seems to be high speed usb devices (e.g 4k webcam, usb ethernet adapters) connected to ASMedia controllers.
I have compiled 5.0.0 with a variety of existing quirks but nothing has done the trick so far. There are a couple of ASMedia firmwares posted on station-drivers. Unfortunately none of them seem to fix the issue either.
Felix Moreno (felix-justdust) wrote : | #29 |
having same problem with Bus 002 Device 004: ID 174c:55aa ASMedia Technology Inc. Name: ASM1051E SATA 6Gb
Guilherme G. Piccoli (gpiccoli) wrote : | #30 |
So Felix, can you provide more details like the machine or device you're using, a dmesg showing the problem, and a bit more information about the device itself? I guess you're the first reporter with a "SATA" device showing that.
Thanks,
Guilherme
Erik Davidson (aphistic) wrote : | #31 |
- egpu_debug.txt Edit (142.9 KiB, text/plain)
I'm also seeing this issue on a fresh install of Ubuntu 19.10 with a Razer Core X Chroma and a Lenovo X1 Extreme Gen2. I was seeing it on a fully updated Arch Linux install and installed Ubuntu in hopes it would fix the issue. Here's some info from my current install. Let me know if you need anything else!
uname:
Linux fate 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
I've attached all the same info you were asking for earlier.
Guilherme G. Piccoli (gpiccoli) wrote : | #32 |
Hi Erik, thanks for your report! Can you attach a dmesg right after the issue reproduces?
Also, are you willing to run debug kernels in your machine?
The problem was narrowed down to a FW issue fixed by ASMedia in form of firmware upgrade but this seems to not be available from ASMedia themselves; instead, the motherboard vendor usually is the path for obtaining such fix.
That said, I'd be really glad if we could quirk this from kernel perspective to get the fix to a wider audience, not relying on unresponsive motherboard vendors. So let me know if you (also applies to anybody that reported the issue) are willing to run debug kernels.
Cheers,
Guilherme
Kai-Heng Feng (kaihengfeng) wrote : | #33 |
For reference, here's the analysis from xHCI maintainer:
https://<email address hidden>/
Guilherme G. Piccoli (gpiccoli) wrote : | #34 |
Thanks a lot @kaihengfeng! Quite great discussion with Mathias - it seems there's a potential quirk for IN packets, but the right approach indeed is getting the HW fixed by ASMedia.
Cheers,
Guilherme
Bryan Walsh (yetanotherbryan) wrote : | #35 |
I would be willing to try a debug kernel.
Guilherme G. Piccoli (gpiccoli) wrote : | #36 |
Thank you Bryan! We can try the "hackish" approach proposed by Mathias in that thread..let me study the code and get back to you in next few weeks!
Cheers,
Guilherme
Bryan Walsh (yetanotherbryan) wrote : | #37 |
Sounds good. I'm not sure if matters or not but, I'm now on Ubuntu 19.10. I'm seeing the exact same behavior as before.
Erik Davidson (aphistic) wrote : | #38 |
- egpu_debug.txt Edit (127.4 KiB, text/plain)
Guilherme, I've attached a dmesg that ends as soon as my ethernet in the egpu disconnects. It's just a matter of running something like "fast.com" a couple times to trigger it.
I'd also be willing to try a debug kernel or whatever else I can do to help get this fixed!
Erik Davidson (aphistic) wrote : | #39 |
I also wanted to mention that in my case after the issue is triggered I can unplug the cable from the ethernet jack on the eGPU I have, then plug it back in and it'll work again for a little bit until I trigger it again.
Danny Pacheco (vfdb67) wrote : | #40 |
I am seeing this same issue on my system. Any help would be greatly appreciated. I am using Ubuntu 16.04 with the 4.15.0-88-generic kernel. I have seen it on both host controllers on the motherboard.
Here is the info for the host controllers.
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af] (prog-if 30 [XHCI])
Subsystem: ASRock Incorporation Device [1849:a2af]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 32
Region 0: Memory at 92f30000 (64-bit, non-prefetchable) [size=64K]
Kernel driver in use: xhci_hcd
b3:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:2142] (prog-if 30 [XHCI])
Subsystem: ASRock Incorporation Device [1849:2142]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 33
Region 0: Memory at fbe00000 (64-bit, non-prefetchable) [size=32K]
no longer affects: | linux (Ubuntu Artful) |
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Won't Fix |
Changed in linux (Ubuntu Focal): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Confirmed |
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Confirmed |
196 comments hidden Loading more comments | view all 276 comments |
In Linux Kernel Bug Tracker #202541, ZeroBeat (zerobeat-linux-kernel-bugs) wrote : | #237 |
Stanislaw, and you're not the only one. I doubt it, too.
Maybe I patched my kernel to death and it is time for me to compile a fresh one.
But anyway, thanks for your effort an for your patience.
In Linux Kernel Bug Tracker #202541, ZeroBeat (zerobeat-linux-kernel-bugs) wrote : | #238 |
Stanislaw, short notice for you. Now, I'm running the fresh kernel (the RYZEN is really fast compiling it). Patch v2 is applied.
Everything is working fine and all Bogus messages are gone.
Thanks again.
In Linux Kernel Bug Tracker #202541, wgh (wgh-linux-kernel-bugs) wrote : | #239 |
(In reply to Mathias Nyman from comment #139)
> rewritten URB cancel, endpoint stop and set trb deq can be found in my tree
> in rewrite_
>
> git://git.
> rewrite_
>
> https:/
> ?h=rewrite_
>
> Does that help?
I applied the patch to 5.10.11-gentoo, and it did help with my HackRF One (see comment #136 for details and hardware)! No ill effects so far.
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #240 |
After discussion on my posted patch here:
https://<email address hidden>/t/#u
it was concluded that this should be rather be xhci quirk instead of rt2800usb driver flag.
If change from comment 147 help for you with the problem, please provide PCI-id of your xHCI controller. This can be done by command:
lspci -k -nn | grep -B2 xhci
If you have more than one xHCI controller please assure you provide PCI-id's of one that actually has the problem ('lspci -t' command can be useful as well)
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #241 |
(In reply to Stanislaw Gruszka from comment #173)
> If you have more than one xHCI controller please assure you provide PCI-id's
> of one that actually has the problem ('lspci -t' command can be useful as
> well)
I meant 'lsusb -t'
In Linux Kernel Bug Tracker #202541, ZeroBeat (zerobeat-linux-kernel-bugs) wrote : | #242 |
USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
Subsystem: ASMedia Technology Inc. Device [1b21:1142]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #243 |
Created attachment 295055
0001-usb-
This is next proposed fix. It suppose to disable Soft Retry for affected xHCI controllers. Currently only for xHCI device reported by Michael:
PCI_VENDOR_ID_AMD = 0x1022 , PCI_DEVICE_
If you want to test and have different xHCI host you need to add your PCI-id's to
drivers/
In Linux Kernel Bug Tracker #202541, ZeroBeat (zerobeat-linux-kernel-bugs) wrote : | #244 |
@Stanislaw, I followed the discussion you mentioned here:
https:/
Other devices than rt2800usb devices are affected, too.
Tested this one before applying your patch:
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
and running into the same xhci issue on USB controller mentioned here:
https:/
[10214.423508] usb 1-2: new high-speed USB device number 3 using xhci_hcd
[10214.602833] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[10214.602838] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[10214.602841] usb 1-2: Product: Edimax Wi-Fi
[10214.602843] usb 1-2: Manufacturer: MediaTek
[10214.602845] usb 1-2: SerialNumber: 1.0
[10214.931553] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
[10215.102895] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[10215.132670] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[10216.101346] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[10216.111983] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[10217.189574] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[10217.190361] usbcore: registered new interface driver mt7601u
[10217.199429] mt7601u 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[10296.419053] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[10296.419228] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
In Linux Kernel Bug Tracker #202541, jg.staffel (jg.staffel-linux-kernel-bugs) wrote : | #245 |
The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE 20):
Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:49:49 [kernel] [35029.419748] usb 1-6: USB disconnect, device number 3
Feb 03 09:49:52 [kernel] [35031.994403] usb 1-6: new full-speed USB device number 6 using xhci_hcd
Feb 03 09:50:45 [kernel] [35085.400634] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Feb 03 09:50:45 [kernel] [35085.404278] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX
Feb 03 09:50:45 [kernel] [35085.404398] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 1
Feb 03 09:50:45 [kernel] [35085.404401] xhci_hcd 0000:01:00.0: Looking for event-dma 00000008146ff050 trb-start 00000008146ff060 trb-end 00000008146ff060 seg-start 00000008146ff000 seg-end 00000008146ffff0
$ lspci -k -nn | grep -B2 xhci
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
Kernel driver in use: xhci_hcd
--
09:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
Subsystem: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:139d]
Kernel driver in use: xhci_hcd
--
0a:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:7914]
Kernel driver in use: xhci_hcd
$ uname -a
Linux Gentoo 5.4.92-gentoo #1 SMP PREEMPT Thu Jan 28 20:45:52 MSK 2021 x86_64 AMD Ryzen 5 2600 Six-Core Processor AuthenticAMD GNU/Linux
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #246 |
(In reply to Michael from comment #177)
> Other devices than rt2800usb devices are affected, too.
> Tested this one before applying your patch:
> ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi
> and running into the same xhci issue on USB controller mentioned here:
> https:/
Ok, so it makes sense to disable Soft Retry per xHCI.
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #247 |
(In reply to alpir from comment #178)
> The same problem (with ID 04a9:220d Canon, Inc. CanoScan N670U/N676U/LiDE
> 20):
>
> Feb 03 09:48:54 [kernel] [34974.104606] xhci_hcd 0000:01:00.0: WARN Set TR
> Deq Ptr cmd failed due to incorrect slot or ep state.
alpir, does the change from comment 147 help for you ?
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #248 |
alpir, you have different device-id than Michael, but you both have the same subsytem device: ASMedia 1b21:1142. So perhaps patch should be based on subdevice id's. Let's wait for other users reports regarding xHCI controller, we will see then.
In Linux Kernel Bug Tracker #202541, jg.staffel (jg.staffel-linux-kernel-bugs) wrote : | #249 |
I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the same.
Why did I even start looking for the reason for the strange behavior of OSD ports: two my JetFlash Transcend 8GB flash drives connected to the USB3 port is sometimes not detected by the system as being mountable (fat32). When I run a disk check (8 Gb) with the command badblocks -nvs / dev / sdd, then after a while the check ends with the following error: Pass completed, 5662144 bad blocks found. (5662144/0/0 errors). And both flash drives.
But if you connect them to USB2, then there are no errors at all.
At the same time, when looking at the logs, I found errors: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Now, after patch, i get next in logs:
Feb 03 17:47:14 [kernel] [ 52.603587] usb 2-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:47:14 [kernel] [ 52.636130] usb-storage 2-3:1.0: USB Mass Storage device detected
Feb 03 17:47:14 [kernel] [ 52.636242] scsi host11: usb-storage 2-3:1.0
Feb 03 17:47:14 [kernel] [ 52.651996] usbcore: registered new interface driver uas
Feb 03 17:47:16 [kernel] [ 54.013780] scsi 11:0:0:0: Direct-Access JetFlash Transcend 8GB 1100 PQ: 0 ANSI: 6
Feb 03 17:47:16 [kernel] [ 54.014688] sd 11:0:0:0: [sdd] 15425536 512-byte logical blocks: (7.90 GB/7.36 GiB)
Feb 03 17:47:16 [kernel] [ 54.015150] sd 11:0:0:0: [sdd] Write Protect is off
Feb 03 17:47:16 [kernel] [ 54.015156] sd 11:0:0:0: [sdd] Mode Sense: 43 00 00 00
Feb 03 17:47:16 [kernel] [ 54.015625] sd 11:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 03 17:47:16 [kernel] [ 54.028165] sdd: sdd1
Feb 03 17:47:16 [kernel] [ 54.045687] sd 11:0:0:0: [sdd] Attached SCSI removable disk
Feb 03 17:48:04 [kernel] [ 102.221862] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:51:52 [kernel] [ 330.009696] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:55:55 [kernel] [ 573.644576] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:01 [kernel] [ 579.149875] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:01 [kernel] [ 579.254204] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:06 [kernel] [ 584.781836] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:07 [kernel] [ 585.073435] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:12 [kernel] [ 590.413816] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:12 [kernel] [ 590.518146] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:18 [kernel] [ 596.046034] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:18 [kernel] [ 596.336445] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:23 [kernel] [ 601.677932] usb 2-3: device descriptor read/8, error -110
Feb 03 17:56:23 [kernel] [ 601.782091] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Feb 03 17:56:29 [kernel] [ 607.309722] usb 2-3: device descr...
In Linux Kernel Bug Tracker #202541, bernhard.gebetsberger (bernhard.gebetsberger-linux-kernel-bugs) wrote : | #250 |
My controller has the PCI ID 43bb, so I've added "PCI_DEVICE_
In Linux Kernel Bug Tracker #202541, ZeroBeat (zerobeat-linux-kernel-bugs) wrote : | #251 |
@Stanislaw, I'm running an older mobo and a RYZEN 1700.
I don't need CPU power - GPU power is more important for me (crypto analysis).
In Linux Kernel Bug Tracker #202541, biopsin (biopsin-linux-kernel-bugs) wrote : | #252 |
[Continuing my first report in comment:https:/
$ lspci -k -nn | grep -B2 xhci
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
Subsystem: ASMedia Technology Inc. Device [1b21:1142]
Kernel driver in use: xhci_hcd
I have adapted the patch by Mr. Gruszka [https:/
$ uname -a
Linux voidx 5.4.95_1 #1 SMP PREEMPT 1612063540 x86_64 GNU/Linux
If someone has some spare time to glance at it or comment on my error ;)
(diff availible for 30 days) @
https:/
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #253 |
(In reply to alpir from comment #182)
> I tried patch from comment 147. The error "WARN Set TR Deq Ptr cmd failed
> due to incorrect slot or ep state" has gone. But behavior USDB3.1 still the
> same.
[snip]
> But if you connect them to USB2, then there are no errors at all.
alpir, I think you experiencing different issue that can not be solved by simply disabling Soft Retry. Some more fixes are possibly needed for handing your xHCI/usb hardware. Maybe you can try patch from comment 139? If this is regression, maybe you can bisect to find offending commit? Anyway your problems, most likely will require expertise of Mathias Nyman - xhci driver maintainer.
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #254 |
(In reply to biopsin from comment #185)
> [Continuing my first report in
> comment:https:/
Similarly like for as for alpir case this most likely will require some different fixes, but you can try if disabling Soft Retry works. You can just disable like showed in comment 147
> $ lspci -k -nn | grep -B2 xhci
> 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series
> Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
> Subsystem: ASMedia Technology Inc. Device [1b21:1142]
> Kernel driver in use: xhci_hcd
>
[snip]
> If someone has some spare time to glance at it or comment on my error ;)
> (diff availible for 30 days) @
> https:/
ASMedia is subsystem_
diff --git a/drivers/
index 906a0e08821e.
--- a/drivers/
+++ b/drivers/
@@ -102,6 +102,9 @@ static void xhci_pci_
id = pci_match_
+ printk("vendor: 0x%04x device 0x%04x subvendor 0x%04x subdevice 0x%04x\n",
+ pdev->vendor, pdev->device, pdev->subsystem
+
if (id && id->driver_data) {
If indeed those are subsystem ID's I think there is bug in existing xhci-pci.c quirks code:
if (pdev->vendor == PCI_VENDOR_
if (pdev->vendor == PCI_VENDOR_
if (pdev->vendor == PCI_VENDOR_
and those check should be replaced by pdev->subsystem
In Linux Kernel Bug Tracker #202541, stf_xl (stfxl-linux-kernel-bugs) wrote : | #255 |
Created attachment 295065
asmedia_
This patch apply existing xhci ASMedia quirks also for ASMedia subdevices .
Looking into changelog history those quirks helped with some usb disk issues, so perhaps patch could help with disk issues reported here i.e. alpir and biopsin cases. Please test.
In Linux Kernel Bug Tracker #202541, jg.staffel (jg.staffel-linux-kernel-bugs) wrote : | #256 |
None of the patches (comments 139, 147, 188) did not solve my problem.
In Linux Kernel Bug Tracker #202541, biopsin (biopsin-linux-kernel-bugs) wrote : | #257 |
@Gruszka
Your patch [https:/
I'm currently testing it with my setup and kernel 5.4.95_x86_64.
Tested against one PATA and one SATA drives, so far I see no ill effects, but I also can't confirm or deny it does anything with this short timespan, and much have change since my initial post last year. I will at least continuing applying it now and then out this year and report any newsworthy. Thank you for your time and help!
In Linux Kernel Bug Tracker #202541, raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote : | #258 |
Created attachment 295151
Dmesg of a Toshiba USB 3.0 HDD connected to USB 3.0 front port and back port.
I am having this error on Linux 5.10.10-051010 while trying to connect a USB 3.0 hard disk, Toshiba Touro 4TB (HitachiGST). If I connect the disk to a USB 2.0 port it works flawlessly.
The kernel shows a different kind of error depending on whether I connect the HDD to the front or back USB 3.0 ports of the motherboard MSI X470 Gaming Plus MAX.
lspci -vnnt:
> -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) Root Complex [1022:1450]
> +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-0fh) I/O Memory Management Unit [1022:1451]
> +-01.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD
> Controller SM981/PM981/PM983 [144d:a808]
> +-01.3-
> [1022:43d0]
> | +-00.1 Advanced Micro Devices, Inc. [AMD] 400
> Series Chipset SATA Controller [1022:43c8]
> | \-00.2-
> | +-01.0-[22]----00.0 Realtek
> Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit
> Ethernet Controller [10ec:8168]
> | +-02.0-[23]--
> | +-03.0-[24]--
> | +-04.0-[25]--
> | \-08.0-[26]----00.0 ASMedia
> Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
> +-02.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-03.1-[27]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere [Radeon RX 470/480/
> | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI]
> Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
> +-04.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-07.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD]
> Zeppelin/
> | +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h
> (Models 00h-0fh) Platform Security Processor [1022:1456]
> | \-00.3 Advanced Micro Devices, Inc. [AMD] Zeppelin
> USB 3.0 Host controller [1022:145f]
> +-08.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models
> 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
> +-08.1-[29]--+-00.0 Advance...
In Linux Kernel Bug Tracker #202541, raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote : | #259 |
Created attachment 295183
Dmesg of a OnePlus 7 Pro connecting in USB 3.1 gen1 mode. No errors.
(In reply to raul from comment #191)
Connecting a Oneplus 7 Pro smartphone does show any error. This phone has a USB 3.1 gen1 port and connects in that mode without errors. I can navigate the filesystem as one would expect.
Changed in linux: | |
importance: | Unknown → High |
status: | Unknown → Confirmed |
In Linux Kernel Bug Tracker #202541, tisaak (tisaak-linux-kernel-bugs) wrote : | #260 |
Same issue with a Seagate Portable 4 TB USB 3.0 drive that I connect with usb-storage quirks as its UAS implementation is problematic. Random hangs that flood dmesg with errors.
lsusb -tv
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
|__ Port 3: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
ID 0bc2:231a Seagate RSS LLC Expansion Portable
Errors in dmesg start like this...
xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
usb 3-3: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=
sd 5:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 a4 01 ed 78 00 00 00 10 00 00
After that:
task:usb-storage state:D stack: 0 pid: 286 ppid: 2 flags:0x00004000
Call Trace:
__schedule+
? usleep_
schedule+
schedule_
? __prepare_
__wait_
usb_sg_
usb_stor_
usb_stor_
usb_stor_
? __prepare_
? __wait_
usb_stor_
? storage_
kthread+
? __kthread_
ret_from_
In Linux Kernel Bug Tracker #202541, mathias.nyman (mathias.nyman-linux-kernel-bugs) wrote : | #261 |
(In reply to Zak from comment #193)
>
>
> Errors in dmesg start like this...
>
> xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
> xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
There are recent major changes in this area in the xhci driver.
The above message no longer exists, new message in this case is
"Set TR Deq already pending, don't submit for x"
Can you try this on a 5.12-rc kernel?
Thanks
Mathias
In Linux Kernel Bug Tracker #202541, mlkcampion (mlkcampion-linux-kernel-bugs) wrote : | #262 |
Created attachment 296259
xhci no soft retry for Intel xhci 8086:06ed and 8086:31a8
Hi
I am having this issue on 2 systems when I plug in
a Hoco Hub HB16. The Hoco Hub HB16 is a 6 in 1 adapter that
includes
Type-C to USB3.0 x3
Type-C to HDMI
Type-C to RJ45 Ethernet (RealTek RTL8153, linux loads driver rtl8153b-2)
Type-C to Type-C(PD2.0)
USB Billboard device
Also when the device is plugged into a Windows10 machine
for the first time it presents a disk that contains the RTL8153
drivers, the user is provided with an option to install these. This
"disk" is not visible later.
The 2 systems where this device failed both reported
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
Both systems have Ubuntu Mate 20.10
$ uname -a
5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
1. Dell XPS 9500 (Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz)
$ sudo lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
Subsystem: Dell Comet Lake USB 3.1 xHCI Host Controller [1028:097d]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
--
7:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
Subsystem: Dell JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [1028:097d]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
2. Seed Studio Odyssey J4105 (Intel(R) Celeron(R) J4105 CPU @ 1.50GHz)
$ sudo lspci -k -nn | grep -B3 xhci
00:15.0 USB controller [0c03]: Intel Corporation Device [8086:31a8] (rev 03)
DeviceName: Onboard - Other
Subsystem: Intel Corporation Device [8086:7270]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
I applied the changes in Stanislaw's patch at comment 176, I added the
PCI IDs to match both my systems.
I can confirm that with the patch applied both systems no longer reported the
issue ""WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
Just to note that on the Dell XPS I use the Dell DA20 Adapter which is a Type-C
to USB and HDMI adapter. This appears to have an ASIX Elec. Corp. AX88179
USB 3.0 to Gigabit Ethernet which I don't have any issues with.
In Linux Kernel Bug Tracker #202541, luke-jr+linuxbugs (luke-jr+linuxbugs-linux-kernel-bugs) wrote : | #263 |
Encountered this with a PCI-e card using ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
Moved to my native "Intel Corporation Device a3af" USB bus, this error disappeared (though other problems remain in my case)
Linux 5.10.33
Of potential noteworthiness: When I got my Talos II, I tried to move this ASMedia USB PCI-e card to it, and found it was immediately shutdown by the IOMMU whenever I would try to use it at all. It seems the firmware is garbage.
IIRC, someone was getting close to an open source firmware replacement without those issues... would be interesting to see if it helps with this bug as well.
In Linux Kernel Bug Tracker #202541, dront78 (dront78-linux-kernel-bugs) wrote : | #264 |
same problem
5.12.12-arch1-1 #1 SMP PREEMPT Fri, 18 Jun 2021 21:59:22 +0000 x86_64 GNU/Linux
GPD Pocket
00:00.0 Host bridge [0600]: Intel Corporation Atom/Celeron/
Subsystem: Intel Corporation Device [8086:7270]
Kernel driver in use: iosf_mbi_pci
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom/Celeron/
DeviceName: Onboard IGD
Subsystem: Intel Corporation Device [8086:7270]
Kernel driver in use: i915
Kernel modules: i915
00:0b.0 Signal processing controller [1180]: Intel Corporation Atom/Celeron/
Subsystem: Intel Corporation Device [8086:7270]
Kernel driver in use: proc_thermal
Kernel modules: processor_
00:14.0 USB controller [0c03]: Intel Corporation Atom/Celeron/
Subsystem: Intel Corporation Device [8086:7270]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:1a.0 Encryption controller [1080]: Intel Corporation Atom/Celeron/
Subsystem: Intel Corporation Device [8086:7270]
Kernel modules: mei_txe
00:1c.0 PCI bridge [0604]: Intel Corporation Atom/Celeron/
Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Atom/Celeron/
Subsystem: Intel Corporation Device [8086:7270]
Kernel modules: lpc_ich
01:00.0 Network controller [0280]: Broadcom Inc. and subsidiaries BCM4356 802.11ac Wireless Network Adapter [14e4:43ec] (rev 02)
Subsystem: Gemtek Technology Co., Ltd Device [17f9:0036]
Kernel driver in use: brcmfmac
Kernel modules: brcmfmac
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x5B8DE000.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: 5.11
Release Date: 06/28/2017
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 4 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 5.11
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Default string
Product Name: Default string
Version: Default string
Serial Number: Default string
UUID: 03000200-
Wake-up ...
Changed in linux (Debian): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
Changed in linux (Ubuntu): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
Changed in linux (Ubuntu Trusty): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
Changed in linux (Ubuntu Bionic): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
Changed in linux (Ubuntu Focal): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
Changed in linux (Ubuntu Xenial): | |
assignee: | Guilherme G. Piccoli (gpiccoli) → nobody |
In Linux Kernel Bug Tracker #202541, antdev66 (antdev66-linux-kernel-bugs) wrote : | #265 |
I have same problem with kernels 5.13.12 and 5.14.0-rc7:
dmesg:
xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
journalctl:
ago 24 18:38:40 SERVER kernel: sd 4:0:0:0: [sda] tag#3 FAILED Result: hostbyte=
In Linux Kernel Bug Tracker #202541, stulluk (stulluk-linux-kernel-bugs) wrote : | #266 |
I also experience exactly same issue on multiple USB devices ( USB-WIFI or a USB-Webcam ) only on my brand new AMD Mainboard ( ASRock model: B550M-HDV)
I tried both ubuntu focal and hirsute with latest kernels on my OldPC (ASUSTeK model: M5A78L-M LX3) and on my IntelNUC (NUC8BEB) and this issue does not happen (Tried with same USB-WIFI and USB-Webcam devices).
Issue is easily reproducible by inserting USB-WIFI and then executing "ip a" on a shell.
In Linux Kernel Bug Tracker #202541, dion (dion-linux-kernel-bugs) wrote : | #267 |
I also have exactly same problem, but with a bit different HW.
Now it's USB DAC branded as "Qudelix-5K". As far as I understand it's USB1 device.
[ 174.358189] usb 5-2.3.2.2.1.1: new full-speed USB device number 17 using xhci_hcd
[ 174.475229] usb 5-2.3.2.2.1.1: New USB device found, idVendor=0a12, idProduct=4025, bcdDevice=19.70
[ 174.475232] usb 5-2.3.2.2.1.1: New USB device strings: Mfr=1, Product=8, SerialNumber=3
[ 174.475233] usb 5-2.3.2.2.1.1: Product: Qudelix-5K USB DAC/MIC 48KHz
[ 174.475234] usb 5-2.3.2.2.1.1: Manufacturer: QTIL
[ 174.475235] usb 5-2.3.2.2.1.1: SerialNumber: ABCDEF0123456789
It produces corrupted sound (actually some noise) just after a few seconds of playback if connected to Dell WD19TB thunderbolt dock station. Issue happens with USB-A ports on dock plus one Type-C port (front). Second Type-C port (named as "Type-C with Thunderbolt 3 port" works.
When such noise happens I'm getting followed in dmesg:
xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe940f0 trb-start 00000000ffe94100 trb-end 00000000ffe94100 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0
xhci_hcd 0000:3a:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 5 comp_code 1
xhci_hcd 0000:3a:00.0: Looking for event-dma 00000000ffe949b0 trb-start 00000000ffe949c0 trb-end 00000000ffe949c0 seg-start 00000000ffe94000 seg-end 00000000ffe94ff0
I've tried to add/remove extra USB hubs (originally Qudelix was plugged to internal USB3 hub of monitor). But even if plugged directly to dock, it produces corrupted sound.
Another important thing: this dock has built-in Ethernet with r8153 chipset like mentioned above.
After reading comments here I've tried to disable soft retry using followed patch:
diff --git a/drivers/
index 1c9a7957c45c.
--- a/drivers/
+++ b/drivers/
@@ -189,10 +189,11 @@ static void xhci_pci_
if (pdev->vendor == PCI_VENDOR_
+ xhci->quirks |= XHCI_NO_SOFT_RETRY;
}
if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
And it completely fixed issue for me. DAC produces clear sound even if connected through chain of two hubs!
PS.
lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller [8086:02ed]
Subsystem: Hewlett-Packard Company Comet Lake PCH-LP USB 3.1 xHCI Host Controller [103c:8724]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
--
37:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
Subsystem: Hewlett-P...
In Linux Kernel Bug Tracker #202541, raulvior.bcn (raulvior.bcn-linux-kernel-bugs) wrote : | #268 |
Turns out the problem was the cable, it was too long. A shorter USB 3.0 cable (1.8m) allowed a stable connection. On the same Linux 5.13 (the previous dmesg was on Linux 5.10) the longer 3 meters cable kept failing while with the 1.8 meters cable the HDD works without issue.
(In reply to raul from comment #191)
In Linux Kernel Bug Tracker #202541, S.Braendlin (s.braendlin-linux-kernel-bugs) wrote : | #269 |
Hi,
I have also issues with USB3 on my Debian 10 with kernel 5.10.0-
Aug 6 13:20:14 media-server kernel: [ 964.069355] scsi host17: uas_eh_
Aug 6 13:20:14 media-server kernel: [ 964.197532] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Aug 6 13:20:14 media-server kernel: [ 964.219053] scsi host17: uas_eh_
Aug 6 13:20:18 media-server kernel: [ 968.137601] task:sync state:D stack: 0 pid:12237 ppid: 11291 flags:0x00004324
Aug 6 13:20:18 media-server kernel: [ 968.137607] Call Trace:
Aug 6 13:20:18 media-server kernel: [ 968.137621] __schedule+
Aug 6 13:20:18 media-server kernel: [ 968.137630] schedule+0x3c/0xa0
Aug 6 13:20:18 media-server kernel: [ 968.137635] io_schedule+
Aug 6 13:20:18 media-server kernel: [ 968.137644] wait_on_
Aug 6 13:20:18 media-server kernel: [ 968.137651] ? __page_
Aug 6 13:20:18 media-server kernel: [ 968.137657] wait_on_
Aug 6 13:20:18 media-server kernel: [ 968.137663] __filemap_
Aug 6 13:20:18 media-server kernel: [ 968.137673] ? sync_inodes_
Aug 6 13:20:18 media-server kernel: [ 968.137679] filemap_
Aug 6 13:20:18 media-server kernel: [ 968.137684] iterate_
Aug 6 13:20:18 media-server kernel: [ 968.137691] ksys_sync+0x7c/0xb0
Aug 6 13:20:18 media-server kernel: [ 968.137697] __do_sys_
Aug 6 13:20:18 media-server kernel: [ 968.137704] do_syscall_
Aug 6 13:20:18 media-server kernel: [ 968.137709] entry_SYSCALL_
Aug 6 13:20:18 media-server kernel: [ 968.137714] RIP: 0033:0x7fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137717] RSP: 002b:00007ffcdd
Aug 6 13:20:18 media-server kernel: [ 968.137723] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc4ec0529aa
Aug 6 13:20:18 media-server kernel: [ 968.137725] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000a8002000
Aug 6 13:20:18 media-server kernel: [ 968.137728] RBP: 0000000000000000 R08: 0000555ba9703dcf R09: 00007ffcddf4afe2
Aug 6 13:20:18 media-server kernel: [ 968.137730] R10: 00007fc4ec01a201 R11: 0000000000000246 R12: 0000000000000001
Aug 6 13:20:18 media-server kernel: [ 968.137733] R13: 0000000000000001 R14: 00007ffcddf49158 R15: 0000000000000000
In Linux Kernel Bug Tracker #202541, pupilla (pupilla-linux-kernel-bugs) wrote : | #270 |
Hello everyone,
I encountered the problem with kernel 6.0.0-rc3 on a lenovo t470 laptop and a usb3 axis card. The system was started with the parameter intel_idle.
I have another similar setup (same laptop and same usb3 network card, but with linux 6.0.0-rc2) that has been active for 8 days started without the parameter intel_idle.
The distribution is Slackware 15 (64 bit).
This is the full output of dmesg.
Any feedback is welcome.
Marco
[ 0.000000] Linux version 6.0.0-rc3 (root@Cherepakha) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP PREEMPT_DYNAMIC Tue Aug 30 16:07:18 CEST 2022
[ 0.000000] Command line: auto BOOT_IMAGE=Linux ro root=10303 intel_idle.
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ 0.000000] signal: max sigframe size: 1616
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000
[ 0.000000] BIOS-e820: [mem 0x000000000009d
[ 0.000000] BIOS-e820: [mem 0x00000000000e0
[ 0.000000] BIOS-e820: [mem 0x0000000000100
[ 0.000000] BIOS-e820: [mem 0x0000000040000
[ 0.000000] BIOS-e820: [mem 0x0000000040400
[ 0.000000] BIOS-e820: [mem 0x000000008b79c
[ 0.000000] BIOS-e820: [mem 0x0000000090653
[ 0.000000] BIOS-e820: [mem 0x0000000090654
[ 0.000000] BIOS-e820: [mem 0x000000009b52d
[ 0.000000] BIOS-e820: [mem 0x000000009b59a
[ 0.000000] BIOS-e820: [mem 0x000000009b5ff
[ 0.000000] BIOS-e820: [mem 0x00000000f0000
[ 0.000000] BIOS-e820: [mem 0x00000000fd000
[ 0.000000] BIOS-e820: [mem 0x00000000fec00
[ 0.000000] BIOS-e820: [mem 0x00000000fed00
[ 0.000000] BIOS-e820: [mem 0x00000000fed10
[ 0.000000] BIOS-e820: [mem 0x00000000fed84
[ 0.000000] BIOS-e820: [mem 0x00000000fee00
[ 0.000000] BIOS-e820: [mem 0x00...
In Linux Kernel Bug Tracker #202541, pupilla (pupilla-linux-kernel-bugs) wrote : | #271 |
Hello everyone,
unfortunately it happened again (system started without parameters):
[ 9.561808] br0: port 2(eth1) entered forwarding state
[95735.974041] usb 2-1: USB disconnect, device number 2
[95735.974215] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974439] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[95735.974471] ax88179_178a 2-1:1.0 eth1: unregister 'ax88179_178a' usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet
[95735.974523] ax88179_178a 2-1:1.0 eth1: Failed to read reg index 0x0002: -19
[95735.974532] ax88179_178a 2-1:1.0 eth1: Failed to write reg index 0x0002: -19
[95735.974595] br0: port 2(eth1) entered disabled state
[95735.974783] device eth1 left promiscuous mode
[95735.974790] br0: port 2(eth1) entered disabled state
[95735.992489] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95735.992503] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0001: -19
[95735.992510] ax88179_178a 2-1:1.0 eth1 (unregistered): Failed to write reg index 0x0002: -19
[95736.215301] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd
[95736.566562] ax88179_178a 2-1:1.0 eth1: register 'ax88179_178a' at usb-0000:00:14.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:0e:c6:81:79:01
Marco
In Linux Kernel Bug Tracker #202541, ske5074 (ske5074-linux-kernel-bugs) wrote : | #272 |
I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet adapters. Each one is stable in a lower USB slot. Swapping the adapters does not change the behavior and only impacts the USB device in the higher slot. Changes to different ports without change.
Easily reproducible with the following commands. Basically I'm trying to plumb bond0 again, which works initially, I get the xhci_hcd warning, and the link is down again. System details are also below.
root@higgins:~# dmesg -C ; ifup -a ; ip link | grep enx ; \
> dmesg -H ; dmesg -C ; sleep 70 ; \
> ip link | grep enx ; dmesg -H
3: enxd03745be5afc: <BROADCAST,
16: enx54af9786ab11: <BROADCAST,
[Sep 3 11:05] device enx54af9786ab11 entered promiscuous mode
[ +0.001236] bond0: (slave enx54af9786ab11): Enslaving as a backup interface with a down link
[ +0.006363] vmbr0: the hash_elasticity option has been deprecated and is always 16
[ +0.013972] r8152 2-4:1.0 enx54af9786ab11: Promiscuous mode enabled
[ +0.001344] r8152 2-4:1.0 enx54af9786ab11: carrier on
3: enxd03745be5afc: <BROADCAST,
17: enx54af9786ab11: <BROADCAST,
[Sep 3 11:05] bond0: (slave enx54af9786ab11): link status definitely up, 1000 Mbps full duplex
[Sep 3 11:06] usb 2-4: USB disconnect, device number 12
[ +0.001544] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ +0.001435] bond0: (slave enx54af9786ab11): Releasing backup interface
[ +0.029081] device enx54af9786ab11 left promiscuous mode
[ +0.316190] usb 2-4: new SuperSpeed USB device number 13 using xhci_hcd
[ +0.022053] usb 2-4: New USB device found, idVendor=2357, idProduct=0601, bcdDevice=30.00
[ +0.001297] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ +0.001337] usb 2-4: Product: USB 10/100/1000 LAN
[ +0.001261] usb 2-4: Manufacturer: TP-Link
[ +0.001208] usb 2-4: SerialNumber: 000001
[ +0.137200] usb 2-4: reset SuperSpeed USB device number 13 using xhci_hcd
[ +0.049197] r8152 2-4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[ +0.030905] r8152 2-4:1.0 eth0: v1.12.12
[ +0.007834] r8152 2-4:1.0 enx54af9786ab11: renamed from eth0
root@higgins:~#
-------
System Details
-------
root@higgins:~# uname -a
Linux higgins 5.15.39-4-pve #1 SMP PVE 5.15.39-4 (Mon, 08 Aug 2022 15:11:15 +0200) x86_64 GNU/Linux
root@higgins:~# lspci -k -nn | grep -B2 xhci
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
Subsystem: Lenovo 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [17aa:310b]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
root@higgins:~# lsusb -tv
/: Bus 02.Port 1: D...
In Linux Kernel Bug Tracker #202541, ske5074 (ske5074-linux-kernel-bugs) wrote : | #273 |
(In reply to Sean Kennedy from comment #205)
> I also have the issue. Using Proxmox 7.2 (Debian Bullseye) with a Lenovo
> M910q core-i7-7700T, using two TPLink UE300 (RTL8153) USB to 1Gbe Ethernet
> adapters. Each one is stable in a lower USB slot. Swapping the adapters does
> not change the behavior and only impacts the USB device in the higher slot.
> Changes to different ports without change.
Update - Tried a different dongle - a 2.5Gbe and have two hard drives attached to the system. Doesn't matter where the 2.5Gbe dongle is attached, it eventually errors with "WARN Set TR Deq Ptr cmd failed" And the error rate is only around six times a day right now:
8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G LAN
# dmesg -T | grep xhci
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
[Tue Sep 6 13:37:13 2022] usb usb1: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: xHCI Host Controller
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[Tue Sep 6 13:37:13 2022] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
[Tue Sep 6 13:37:13 2022] usb usb2: Manufacturer: Linux 5.15.39-4-pve xhci-hcd
[Tue Sep 6 13:37:13 2022] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-3: new SuperSpeed USB device number 3 using xhci_hcd
[Tue Sep 6 13:37:14 2022] usb 2-4: new SuperSpeed USB device number 4 using xhci_hcd
[Tue Sep 6 14:39:22 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 14:39:22 2022] usb 2-4: new SuperSpeed USB device number 5 using xhci_hcd
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:01 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 18:44:02 2022] usb 2-4: new SuperSpeed USB device number 6 using xhci_hcd
[Tue Sep 6 22:19:06 2022] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[Tue Sep 6 22:19:07 2022] usb 2-4: new SuperSpeed USB device number 7 using xhci_hcd
Since this drops the device from the system and offlines the link, I created a simple script to detect zero UP ethernet devices via cron once a minute and runs a ifnet -a. It's clunky but works.
crontab:
# m h dom mon dow command
* * * * * /root/fixnet.sh >/dev/null 2>&1
fixnet.sh:
#!/bin/sh
STATE=`ip link | grep " enx" | grep UP | wc -l`
if [ $STATE -gt 0 ]; then
# All good. Exit
exit 0
fi
/usr/sbin/ifup -a
sleep 20
ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -eq 0 ]; then
# Network looks good. Exit.
exit 0
fi
sleep 310
ping -c 1 10.0.0.1 | grep "1 received"
if [ $? -ne 0 ]; then
# The network is still down.
systemctl reboot
fi
In Linux Kernel Bug Tracker #202541, james (james-linux-kernel-bugs) wrote : | #274 |
I'm using a 2.5gb ethernet usb device and getting this error intermittently (a dozen times per day).
$ uname -a
Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsusb
<snip>
Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB 10/100/1G/2.5G
This is what plays out via /var/log/syslog each time:
Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect, device number 15
Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1: unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1: Network is down
Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed Gen 1 USB device number 16 using xhci_hcd
Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB 10/100/1G/2.5G LAN
Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer: Realtek
Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber: 001000001
Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0: MAC-Address: xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting rx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting tx_max = 16384
Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
Dec 21 10:26:47 hephaestus systemd-
Dec 21 10:26:47 hephaestus systemd-
Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_
(then things start back up and the ethernet link goes live again after about 10 seconds)
In Linux Kernel Bug Tracker #202541, james (james-linux-kernel-bugs) wrote : | #275 |
FYI: I have built a kernel with the previously (on this thread) discussed patch (on a 5.4 kernel) and I still have the error multiple times per day.
(In reply to James H from comment #207)
> I'm using a 2.5gb ethernet usb device and getting this error intermittently
> (a dozen times per day).
>
> $ uname -a
> Linux hephaestus 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC
> 2022 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ lsusb
> <snip>
> Bus 003 Device 016: ID 0bda:8156 Realtek Semiconductor Corp. USB
> 10/100/1G/2.5G
>
>
>
> This is what plays out via /var/log/syslog each time:
>
> Dec 21 10:26:47 hephaestus kernel: [346923.166782] usb 3-4: USB disconnect,
> device number 15
> Dec 21 10:26:47 hephaestus kernel: [346923.166913] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.166927] cdc_ncm 3-4:2.0 eth1:
> unregister 'cdc_ncm' usb-0000:00:14.0-4, CDC NCM
> Dec 21 10:26:47 hephaestus kernel: [346923.167071] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus kernel: [346923.170644] xhci_hcd 0000:00:14.0:
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> Dec 21 10:26:47 hephaestus dhclient[320734]: receive_packet failed on eth1:
> Network is down
> Dec 21 10:26:47 hephaestus systemd[1]: Stopping ifup for eth1...
> Dec 21 10:26:47 hephaestus dhclient[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus ifdown[325522]: Killed old client process
> Dec 21 10:26:47 hephaestus kernel: [346923.478913] usb 3-4: new SuperSpeed
> Gen 1 USB device number 16 using xhci_hcd
> Dec 21 10:26:47 hephaestus kernel: [346923.499567] usb 3-4: New USB device
> found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
> Dec 21 10:26:47 hephaestus kernel: [346923.499573] usb 3-4: New USB device
> strings: Mfr=1, Product=2, SerialNumber=6
> Dec 21 10:26:47 hephaestus kernel: [346923.499577] usb 3-4: Product: USB
> 10/100/1G/2.5G LAN
> Dec 21 10:26:47 hephaestus kernel: [346923.499580] usb 3-4: Manufacturer:
> Realtek
> Dec 21 10:26:47 hephaestus kernel: [346923.499583] usb 3-4: SerialNumber:
> 001000001
> Dec 21 10:26:47 hephaestus kernel: [346923.523736] cdc_ncm 3-4:2.0:
> MAC-Address: xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus kernel: [346923.523742] cdc_ncm 3-4:2.0: setting
> rx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.523836] cdc_ncm 3-4:2.0: setting
> tx_max = 16384
> Dec 21 10:26:47 hephaestus kernel: [346923.524578] cdc_ncm 3-4:2.0 eth1:
> register 'cdc_ncm' at usb-0000:00:14.0-4, CDC NCM, xx:xx:xx:xx:xx:xx
> Dec 21 10:26:47 hephaestus systemd-
> naming scheme 'v245'.
> Dec 21 10:26:47 hephaestus systemd-
> is unset or enabled, the speed and duplex are not writable.
> Dec 21 10:26:47 hephaestus systemd[1]: Found device USB_10_
> (then things start back up and the ethernet link goes live again after about
> 10 seconds)
In Linux Kernel Bug Tracker #202541, svmohr (svmohr-linux-kernel-bugs) wrote : | #276 |
I also get random disconnects on kernel 6.3.0-7-generic with a Samsung T7 Shield external SSD drive. Unfortunately it is hard to reproduce this error, it usually takes hours before it occurs the first time.
System:
Kernel: 6.3.0-7-generic arch: x86_64 bits: 64 compiler: N/A Console: pty pts/10 Distro: Ubuntu
23.10 (Mantic Minotaur)
Machine:
Type: Server System: Supermicro product: C9Z390-PGW v: 0123456789 serial: <filter>
Mobo: Supermicro model: C9Z390-PGW v: 1.01A serial: <filter> UEFI: American Megatrends v: 1.3
date: 06/03/2020
CPU:
Info: 8-core model: Intel Core i9-9900K bits: 64 type: MT MCP arch: Coffee Lake rev: D cache:
L1: 512 KiB L2: 2 MiB L3: 16 MiB
Speed (MHz): avg: 3687 high: 5002 min/max: 800/5000 cores: 1: 5002 2: 3600 3: 3600 4: 3600
5: 3600 6: 3600 7: 3600 8: 3600 9: 3600 10: 3600 11: 3600 12: 3600 13: 3600 14: 3600 15: 3600
16: 3600 bogomips: 115200
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=
ID 1d6b:0003 Linux Foundation 3.0 root hub
|__ Port 4: Dev 10, If 0, Class=Mass Storage, Driver=uas, 10000M
ID 04e8:61fb Samsung Electronics Co., Ltd
BOOT_IMAGE=
io-pci vfio_pci.
[349280.239403] usb 2-4: USB disconnect, device number 9
[349280.239689] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[349280.239695] usb 2-4: cmd cmplt err -108
[349280.239702] sd 9:0:0:0: [sdh] tag#13 uas_zap_pending 0 uas-tag 1 inflight: CMD
[349280.239705] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239724] sd 9:0:0:0: [sdh] tag#13 FAILED Result: hostbyte=
[349280.239726] sd 9:0:0:0: [sdh] tag#13 CDB: Write(16) 8a 00 00 00 00 00 d3 28 e4 00 00 00 00 d8 00 00
[349280.239728] I/O error, dev sdh, sector 3542672384 op 0x1:(WRITE) flags 0x8800 phys_seg 27 prio class 2
[349280.239741] device offline error, dev sdh, sector 3542674432 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239747] device offline error, dev sdh, sector 3542672640 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 2
[349280.239750] device offline error, dev sdh, sector 3542677504 op 0x1:(WRITE) flags 0x8800 phys_seg 45 prio class 2
[349280.239753] device offline error, dev sdh, sector 3542680576 op 0x1:(WRITE) flags 0x8800 phys_seg 41 prio class 2
[349280.239788] device offline error, dev sdh, sector 3542663168 op 0x1:(WRITE) flags 0x8800 phys_seg 35 prio class 2
[349280.239793] device offline error, dev sdh, sector 3542663680 op 0x1:(WRITE) flags 0x8800 phys_seg 29 prio class 2
[349280.239799] device offline error, dev sdh, sector 3542663936 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 2
[349280.299534] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[349280.523475] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVE...
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1749961
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.