amd_iommu conflict with Marvell 88SE9230 SATA Controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
Medium
|
|||
linux (Debian) |
Fix Released
|
Unknown
|
|||
linux (Fedora) |
Unknown
|
Unknown
|
|||
linux (Ubuntu) |
Incomplete
|
Low
|
Unassigned |
Bug Description
Booting with kernel 4.18.0-
https:/
WORKAROUND: Use kernel boot parameter:
amd_iommu=off
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
CurrentDesktop: XFCE
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=
InstallationDate: Installed on 2016-07-02 (913 days ago)
InstallationMedia: Mythbuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
IwConfig:
lo no wireless extensions.
enp9s0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. B450M S2H
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.173.2
RfKill:
Tags: bionic
Uname: Linux 4.18.0-13-generic x86_64
UpgradeStatus: Upgraded to bionic on 2018-07-27 (158 days ago)
UserGroups: adm cdrom dip lpadmin mythtv nopasswdlogin plugdev sambashare sudo video
_MarkForUpload: True
dmi.bios.date: 12/04/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F2c
dmi.board.
dmi.board.name: B450M S2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: Default string
dmi.product.name: B450M S2H
dmi.product.sku: Default string
dmi.product.
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
In Linux Kernel Bug Tracker #42679, pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote : | #17 |
In Linux Kernel Bug Tracker #42679, pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote : | #18 |
Created attachment 72218
Output of `lspci -knnv' command
In Linux Kernel Bug Tracker #42679, pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote : | #19 |
Created attachment 72219
Kernel config
In Linux Kernel Bug Tracker #42679, public (public-linux-kernel-bugs) wrote : | #20 |
The same problem occurs on a Z68A-GD65 MSI G3 system Marvell 88SE91xx.
grep DMAR:
ACPI: DMAR beaff508 000B0 (v01 ALASKA A M I 00000001 INTL 00000001)
DMAR: Host address width 36
DMAR: DRHD base: 0x000000fed91000 flags: 0x1
DMAR: RMRR base: 0x000000bf4cc000 end: 0x000000bf4eefff
DMAR: No ATSR found
DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
DMAR:[fault reason 02] Present bit in context entry is clear
grep IOMMU:
Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
grep ata8:
ata8: SATA max UDMA/133 abar m2048@0xfa310000 port 0xfa310180 irq 48
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: limiting SATA link speed to 3.0 Gbps
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
In Linux Kernel Bug Tracker #42679, public (public-linux-kernel-bugs) wrote : | #21 |
Created attachment 72419
config file / kernel 3.2.2
Kernel config
In Linux Kernel Bug Tracker #42679, public (public-linux-kernel-bugs) wrote : | #22 |
Created attachment 72420
Output of `lspci -knnv' command
Output of `lspci -knnv' command
In Linux Kernel Bug Tracker #42679, listenmitglied (listenmitglied-linux-kernel-bugs) wrote : | #23 |
I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.
With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520; 3x4GB Ram. Logs/Printouts follow this evening.
In Linux Kernel Bug Tracker #42679, listenmitglied (listenmitglied-linux-kernel-bugs) wrote : | #24 |
Created attachment 72733
kernel config
above bug confirmed with 3.2.13
In Linux Kernel Bug Tracker #42679, listenmitglied (listenmitglied-linux-kernel-bugs) wrote : | #25 |
Created attachment 72734
dmesg intel z68, asus rampage III gene, vt-d enable
In Linux Kernel Bug Tracker #42679, listenmitglied (listenmitglied-linux-kernel-bugs) wrote : | #26 |
Created attachment 72735
lspci, asus rampage III gene, z68, vt-d enable, 3.2.13
In Linux Kernel Bug Tracker #42679, listenmitglied (listenmitglied-linux-kernel-bugs) wrote : | #27 |
(In reply to comment #6)
> I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.
>
> With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520;
> 3x4GB Ram. Logs/Printouts follow this evening.
Also confirmed for current kernel 3.2.13.
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #28 |
From a pdf file by Intel with title "Intel® Virtualization Technology for Directed I/O
Architecture Specification":
--snip--
3.6.1.4 PCI Express Devices Using Phantom Functions
To increase the maximum possible number of outstanding requests requiring completion, PCI Express allows a device to use function numbers not assigned to implemented functions to logically extend the Tag identifier. Unclaimed function numbers are referred to as Phantom Function Numbers (PhFN). A device reports its support for phantom functions through the Device Capability configuration register, and requires software to explicitly enable use of phantom functions through the Device Control configuration register.
Since the function number is part of the requester-id used to locate the context-entry for processing a DMA request, when assigning PCI Express devices with phantom functions enabled, software must program multiple context entries, each corresponding to the PhFN enabled for use by the device function. Each of these context-entries must be programmed identically to ensure the DMA requests with any of these requester-ids are processed identically.
--snip--
grep -ri phant says pci_regs.h knows about the capability, but it doesn't appear anywhere else in the kernel as far as I can see. Look for PCI_EXP_
Unfortunately, lspci indicates that the Marvell chip is not using phantom functions (lspci upload to follow), so at this point I can't tell if I'm on the right trail.
Caveat lector: I don't have any previous experience with low-level PCI stuff.
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #29 |
Created attachment 73265
lspci output including device capabilities
In Linux Kernel Bug Tracker #42679, grythumn (grythumn-linux-kernel-bugs) wrote : | #30 |
I'm seeing similar errors with AMD-Vi (AMD's IOMMU implementation) and a couple of Marvell 88SE9128-based cards, and can confirm that it is still present in 3.7.0 builds.
https:/
In Linux Kernel Bug Tracker #42679, stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote : | #31 |
This problem happens here as well. Asus P9X79 WS, BIOS 3306, X79, i7-3930K. Running kernel 3.7.3. In addition to being unable to use the Marvel SATA controller ports, this causes a ~40s hang during boot.
I tried contacting Asus about this, as I think this could be fixed by a BIOS update, but they replied to me in horrible English they do not support Linux. I'll think twice before buying Asus again in the future, but it would be nice if a workaround could be implemented in the kernel.
In Linux Kernel Bug Tracker #42679, stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote : | #32 |
Created attachment 91521
dmesg on Asus P9X79 WS, kernel 3.7.3
In Linux Kernel Bug Tracker #42679, stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote : | #33 |
Created attachment 91531
lspci -knvv on Asus P9X79 WS, kernel 3.7.3
In Linux Kernel Bug Tracker #42679, stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote : | #34 |
FWIW, I still have this issue with 3.7.8 and 3.8-rc7. BIOS update 3401 for the P9X79 WS didn't help. Additionally the hang during boot becomes worse (up to ~65 seconds), when a hard drive is connected. Since the drive is unusable anyway, I hacked the AHCI driver to ignore the Marvell controller. While no solution to this problem, at least my boot time is back to normal (<30s).
In Linux Kernel Bug Tracker #42679, tradofox (tradofox-linux-kernel-bugs) wrote : | #35 |
Same problem with Marvell 88SE9172 SATA Controller.
I have Gigabyte GA-Z77X-UD5H with two Marvell 88SE9172 SATA controllers and Intel E3-1245v2 CPU. VT-d is enabled. When running normal Debian 7 or >Ubuntu 12.04 i can see HDDs and SSDs connected to Marvell ports. After installing XenServer 6.1 and Xen Cloud Platform 1.6 - HDDs and SSDs are not detected, but lspci showing that Marvell 88SE9172 controllers are detected.
In Linux Kernel Bug Tracker #42679, lizhenhua (lizhenhua-linux-kernel-bugs) wrote : | #36 |
The root cause of this bug seems to be : the device illegally accessed the memory that should be reserved for IOMMU module, and this changed iommu registers.
In Linux Kernel Bug Tracker #42679, bhelgaas (bhelgaas-linux-kernel-bugs) wrote : | #37 |
ZhenHua, can you elaborate on this? Do you mean a device accessed the MMIO space used to program the IOMMU itself? If so, how did you conclude that? I doubt the IOMMU space is at address 0xfff00000.
Based on the following data:
Paweł:
DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
DMAR:[fault reason 02] Present bit in context entry is clear
0b:00.0 [0106]: Marvell [1b4b:9123]
Korneliusz:
DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
DMAR:[fault reason 02] Present bit in context entry is clear
03:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
Daniel:
IOMMU identity map errors (assuming unrelated for now)
DMAR:[DMA Read] Request device [01:00.1] fault addr fff00000
DMAR:[fault reason 02] Present bit in context entry is clear
01:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
Stijn:
dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr fff00000
DMAR:[fault reason 02] Present bit in context entry is clear
07:00.0 0106: 1b4b:9130 (rev 11) (prog-if 01 [AHCI 1.0])
in each case the IOMMU saw a DMA read to an address that wasn't mapped for the requesting device. In each case, the requester is function .1, the kernel doesn't know about a .1 function, and there is a Marvell 912x SATA control at the corresponding .0 function.
Andrew's Phantom Function theory seems like a good direction to explore. Maybe these devices incorrectly report Phantom Function support in the Device Capability & Control, and we just need some sort of quirk to work around that.
It would be interesting to know whether the .0 Marvell function has valid IOMMU mappings for the fault addresses (0xfff00000 or 0xfffc0000), or whether there is really anything at those addresses. They seem like dubious targets for DMA.
In Linux Kernel Bug Tracker #42679, zhen-hual (zhen-hual-linux-kernel-bugs) wrote : | #38 |
Hi guys,
1. Since there are only lspci running in "intel_iommu=on", could you paste lspci -vvv and lspci -t, lspci -n when intel_iommu is not set to on?
Thanks
ZhenHua
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #39 |
Created attachment 109981
Patch with quirk for incorrect PCI requester IDs
Here's a patch that provides a quirk for what I believe to be the root cause: devices that use incorrect PCI requester IDs, including Marvell 91xx controllers.
Various revisions have been sent to LKML and IOMMU-list in the past and a number of people have reported that it solved their problem and I've been running this on two boxes for months. I'm not sure why it hasn't been accepted.
Note that there are several devices that suffer from the same affliction, i.e., using incorrect PCI requester IDs in when their transactions. The Marvell devices use both xx:yy.0 and xx:yy.1, possibly related to the SATA port number. Other devices, like Ricoh's R5C832 PCIe IEEE 1394 Controller commonly found in T410 and T420 Thinkpads use a single incorrect requester ID.
Please try this patch and let me know if it works for you.
In Linux Kernel Bug Tracker #42679, zhen-hual (zhen-hual-linux-kernel-bugs) wrote : | #40 |
Each context_entry has a present bit. If a context entry is used for a device, but its present bit is not set to 1, an error with fault number 2 will occur.
I tested on my PC, comment a line "context_
In Linux Kernel Bug Tracker #42679, zhen-hual (zhen-hual-linux-kernel-bugs) wrote : | #41 |
See this line in file drivers/
context_
It is used to set the present bit of the context entry. comment this line, you will get the error.
In Linux Kernel Bug Tracker #42679, pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote : | #42 |
(In reply to Andrew Cooks from comment #22)
> Please try this patch and let me know if it works for you.
It does remove the lag and there are no longer error entries in system log. I also tried to connect a drive to this controller and it worked too. It appears then that this patch did the job.
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #43 |
(In reply to Andrew Cooks from comment #22)
> Created attachment 109981 [details]
> Patch with quirk for incorrect PCI requester IDs
>
I'm trying to enable the iommu on a gigabyte z87x-ud5h board and a VIA firewire controller and a Marvell 88SE9230 SATA controller is misbehaving. I'll add a couple of entries to your lists when I'm done testing.
I've found one thing that looks a bit strange in your patch. The pci_requester() function is using the devfn member to break the for loop. Take a close look at the last entry for "Mellanox 26428" in the pci_dev_
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #44 |
(In reply to Martin Öhrling from comment #26)
> (In reply to Andrew Cooks from comment #22)
> > Created attachment 109981 [details]
> > Patch with quirk for incorrect PCI requester IDs
> >
>
> I'm trying to enable the iommu on a gigabyte z87x-ud5h board and a VIA
> firewire controller and a Marvell 88SE9230 SATA controller is misbehaving.
> I'll add a couple of entries to your lists when I'm done testing.
Did you have any success? Could you provide more information, please?
> I've found one thing that looks a bit strange in your patch. The
> pci_requester() function is using the devfn member to break the for loop.
> Take a close look at the last entry for "Mellanox 26428" in the
> pci_dev_
> entry, and any entry later appended to the list, will not be evaluated.
Yes, it's definitely broken, but trivial to fix. Thanks for reporting it.
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #45 |
>
> Did you have any success? Could you provide more information, please?
>
I've had some success but I also ran into bug 44881 (pcie to pci bridge shown
as a pci to pci bridge causing pci_find_
The 9230 controller from marvell sent dma requests from function 0 and 1. I got rid of all "Present bit in context entry is clear"-errors when I inserted this entry into pci_dev_
{ PCI_VENDOR_
I'm not considering this to be fully verified since I still have problems to boot the system. Next step is to apply suggested patches for bug 44881. This turned out to be more work than I expected...
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #46 |
I'm still getting one errors reported by the iommu. My best guess is that this is a bios bug:
[ 0.675780] dmar: DRHD: handling fault status reg 2
[ 0.676191] dmar: DMAR:[DMA Read] Request device [06:00.0] fault addr ac0a7000
[ 0.676191] DMAR:[fault reason 06] PTE Read access is not set
It's unlikely that the ahci driver fails to map memory for dma. This is the memory map entry reported by bios (from dmesg):
[ 0.000000] e820: BIOS-provided physical RAM map:
...
[ 0.000000] BIOS-e820: [mem 0x00000000a77e1
...
RMRR ranges:
[ 0.047843] dmar: Host address width 39
[ 0.047845] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.047851] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.047852] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.047856] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[ 0.047857] dmar: RMRR base: 0x000000ba063000 end: 0x000000ba06ffff
[ 0.047858] dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
...
[ 0.470870] IOMMU 0 0xfed90000: using Queued invalidation
[ 0.470871] IOMMU 1 0xfed91000: using Queued invalidation
[ 0.470873] IOMMU: Setting RMRR:
[ 0.470881] IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
[ 0.471198] IOMMU: Setting identity map for device 0000:00:1d.0 [0xba063000 - 0xba06ffff]
[ 0.471219] IOMMU: Setting identity map for device 0000:00:1a.0 [0xba063000 - 0xba06ffff]
[ 0.471236] IOMMU: Setting identity map for device 0000:00:14.0 [0xba063000 - 0xba06ffff]
[ 0.471249] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.471255] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
I can't find anything in the dmesg output that would indicate that the kernel has been made aware of a reserved memory range that includes the offending address.
As far as I can tell, the patch resolved my first issue, I no longer get any present bit errors. My current issue can't be a kernel bug.
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #47 |
Created attachment 124001
Patch with quirk for incorrect PCIe requester IDs.
Changes:
* Include various bit shifting and masking fixes by George Spelvin
* Fix pci_requester() using wrong loop condition, reported by Martin Öhrling
* Expand list of quirked device ids
* Attempt to include support for AMD (needs testing)
If you'd like your name included as Reported-by or Tested-by, let me know.
In Linux Kernel Bug Tracker #42679, tom (tom-linux-kernel-bugs) wrote : | #48 |
I can confirm that Andrew's patch from #30 fixes the issue on my AMD based Gigabyte 990FXA-UD5 board. Both Marvell 88SE9172 controllers (internal and eSATA) are working now. Thanks!
In Linux Kernel Bug Tracker #42679, linux (linux-linux-kernel-bugs) wrote : | #49 |
Can anyone please direct me on how to apply this patch to my kernel?
(3.11.0-15-generic #25-Ubuntu SMP Thu Jan 30 17:22:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux)
Possibly using a deb package?
Excuse me for being a complete noob here. ;)
In Linux Kernel Bug Tracker #42679, bugs (bugs-linux-kernel-bugs) wrote : | #50 |
(In reply to MvW from comment #32)
> Can anyone please direct me on how to apply this patch to my kernel?
> (3.11.0-15-generic #25-Ubuntu SMP Thu Jan 30 17:22:01 UTC 2014 x86_64 x86_64
> x86_64 GNU/Linux)
> Possibly using a deb package?
>
> Excuse me for being a complete noob here. ;)
The kernelnewbies wiki has articels on this stuff, e.g. http://
(In reply to Andrew Cooks from comment #30)
> Created attachment 124001 [details]
> Patch with quirk for incorrect PCIe requester IDs.
>
> Changes:
> * Include various bit shifting and masking fixes by George Spelvin
> * Fix pci_requester() using wrong loop condition, reported by Martin Öhrling
>
> * Expand list of quirked device ids
> * Attempt to include support for AMD (needs testing)
>
> If you'd like your name included as Reported-by or Tested-by, let me know.
I'd like to report that this kind of works for me with kernel 3.13.2 on my ASRock 990FX Extreme4 board, which (sometimes) has a 88SE91A0 controller. I added its PCI-ID (1b4b:91a0) like this: "{ PCI_VENDOR_
This controller really confuses me, when it works it shows up as "02:00.0 IDE interface [0101]: Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s Controller [1b4b:91a0] (rev 12)", but sometimes it's "02:00.0 SATA controller [0106]: Marvell Technology Group Ltd. Device [1b4b:9122] (rev 12)
02:00.1 IDE interface [0101]: Marvell Technology Group Ltd. 88SE912x IDE Controller [1b4b:91a4] (rev 12)" and doesn't work. I think (hope) changing some options in my UEFI like disabling fast boot, the boot failure guard and activating the "make devices on the second sata controller bootable" ROM made it so that it always boots as a 91a0, but I'm not really convinced yet.
When it works, eSata and passthrough with vfio-pci both work. I don't know if this is the right place for this, but Linux doesn't recognize this controller at all, so I'm doing "echo 1b4b 91a0 > /sys/bus/
In Linux Kernel Bug Tracker #42679, lizhenhua (lizhenhua-linux-kernel-bugs) wrote : | #51 |
(In reply to MvW from comment #32)
> Can anyone please direct me on how to apply this patch to my kernel?
> (3.11.0-15-generic #25-Ubuntu SMP Thu Jan 30 17:22:01 UTC 2014 x86_64 x86_64
> x86_64 GNU/Linux)
> Possibly using a deb package?
>
> Excuse me for being a complete noob here. ;)
If you want to apply a kernel patch, you need to checkout the kernel source with git and send mails to kernel.org community. You can google it to get more information.
In Linux Kernel Bug Tracker #42679, linux (linux-linux-kernel-bugs) wrote : | #52 |
(In reply to Andreas Schrägle from comment #33)
> The kernelnewbies wiki has articels on this stuff, e.g.
> http://
(In reply to Li, ZhenHua from comment #34)
> If you want to apply a kernel patch, you need to checkout the kernel source
> with git and send mails to kernel.org community. You can google it to get
> more information.
Thanks for the pointers, I'll be looking in to it shortly.
Will this fix be included in the 3.14 release, will it be backported to 3.11?
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #53 |
(In reply to MvW from comment #35)
> Will this fix be included in the 3.14 release,
No.
> will it be backported to 3.11?
No.
The patch I created needs work.
* The AMD-specific code doesn't really cope with one-to-many mappings (despite what the current patch suggests), because other parts of the AMD IOMMU driver needs work to support this.
* It needs to be restructured to provide a common interface for finding all requester ids and calling a callback for each.
It's unlikely that I'll be able to do the mentioned work soon and there are probably more changes required that I don't know about. I'll keep posting improvements when I can.
Of course there might be other people working on this and a better patch could appear and be included at any time.
In Linux Kernel Bug Tracker #42679, linux (linux-linux-kernel-bugs) wrote : | #54 |
(In reply to Andrew Cooks from comment #36)
> (In reply to MvW from comment #35)
> > Will this fix be included in the 3.14 release,
>
> No.
>
> > will it be backported to 3.11?
>
> No.
>
> The patch I created needs work.
>
> * The AMD-specific code doesn't really cope with one-to-many mappings
> (despite what the current patch suggests), because other parts of the AMD
> IOMMU driver needs work to support this.
>
> * It needs to be restructured to provide a common interface for finding all
> requester ids and calling a callback for each.
>
> It's unlikely that I'll be able to do the mentioned work soon and there are
> probably more changes required that I don't know about. I'll keep posting
> improvements when I can.
>
> Of course there might be other people working on this and a better patch
> could appear and be included at any time.
Does this also apply to the Intel IOMMU implementation which this report is about? I'm having this issue with a Asus P9X79 Deluxe using the same SATA chip (88SE9128) as the bug reporter.
I'm unfortunately unable to fix this myself, so perhaps instead of patching my kernel until this issue has been fixed upstream, I should consider buying another SATA controller since this won't get fixed for a while to come right?
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #55 |
(In reply to MvW from comment #37)
> Does this also apply to the Intel IOMMU implementation which this report is
> about?
Yes, in my opinion.
> I'm unfortunately unable to fix this myself, so perhaps instead of patching
> my kernel until this issue has been fixed upstream, I should consider buying
> another SATA controller since this won't get fixed for a while to come right?
I encourage you to apply the patch. It may need improvement to be acceptable to the mainline developers, but it does work and will be maintained until it is either acceptable for the mainline kernel or until someone else provides an acceptable patch.
If there are other reasons why you can't use the patch, let's try to address those.
In Linux Kernel Bug Tracker #42679, linux (linux-linux-kernel-bugs) wrote : | #56 |
(In reply to Andrew Cooks from comment #38)
> Yes, in my opinion.
What I meant to say was if these AMD specific issues would entitle a different or extension to this patch, making the current implementation without the AMD stuff allegeable for inclusion in the mainline kernel to help all of us Intel based users.
> I encourage you to apply the patch. It may need improvement to be acceptable
> to the mainline developers, but it does work and will be maintained until it
> is either acceptable for the mainline kernel or until someone else provides
> an acceptable patch.
>
> If there are other reasons why you can't use the patch, let's try to address
> those.
I'm inclined to do so, but I also have to consider that this particular machine has to run 24/7 and be upgraded with the necessary security updates as the are released. Re-patching the kernel on every kernel update for an unforeseeable future would imply too much effort compared to buying another simple SATA controller to alleviate this issue altogether.
I do however, want to be of help, even though it only consists of testing the solution at hand, but there are some concerns that need to be addressed.
Is there any (extra) risk of data loss with this patch?
Is there an easy way to apply this patch across kernel updates automatically?
In Linux Kernel Bug Tracker #42679, FlyingShawn (flyingshawn-linux-kernel-bugs) wrote : | #57 |
I can unfortunately confirm that Marvell 88SE92xx series chips also suffer from this issue. My motherboard had a 9120 that didn't work because of this and I needed the ports, so I purchased a 9230-based PCIe add-on card and am experiencing the same symptoms.
(I'm actually not a member of your community: I'm an ESXi user who was pointed to this thread from here: http://
In Linux Kernel Bug Tracker #42679, michael (michael-linux-kernel-bugs-1) wrote : | #58 |
I can confirm that the patch makes the 88SE9172 usable, attached HD seems to work fine. However there still seems to be a problem with accessing the option ROM:
# lspci | grep Marvell
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)
# cd /sys/devices/
# echo 1 > rom
# dd if=rom of=/tmp/rom_dump
dd: error reading ‘rom’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0,000123135 s, 0,0 kB/s
# dmesg
[...]
[ 998.737809] ahci 0000:08:00.0: Invalid ROM contents
I stumbled on this issue when trying to passthrough the Marvell controller to a virtual machine using qemu/kvm. When trying to use pci-assign (or the newer vfio-pci) qemu complains about missing optionrom and will not find any connected drive.
In Linux Kernel Bug Tracker #42679, bugs (bugs-linux-kernel-bugs) wrote : | #59 |
(In reply to michael from comment #41)
> I can confirm that the patch makes the 88SE9172 usable, attached HD seems to
> work fine. However there still seems to be a problem with accessing the
> option ROM:
>
> # lspci | grep Marvell
> 08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s
> Controller (rev 11)
>
> # cd /sys/devices/
> # echo 1 > rom
> # dd if=rom of=/tmp/rom_dump
> dd: error reading ‘rom’: Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0,000123135 s, 0,0 kB/s
> # dmesg
> [...]
> [ 998.737809] ahci 0000:08:00.0: Invalid ROM contents
>
> I stumbled on this issue when trying to passthrough the Marvell controller
> to a virtual machine using qemu/kvm. When trying to use pci-assign (or the
> newer vfio-pci) qemu complains about missing optionrom and will not find any
> connected drive.
I can't access the option rom on my controller either:
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s Controller (rev 12)
However, the device works as well when passed through with vfio-pci as it does on the host (the linux ahci driver works but doesn't recognize it and windows needs a special driver).
In Linux Kernel Bug Tracker #42679, TomWij (tomwij-linux-kernel-bugs) wrote : | #60 |
Similar bug downstream: https:/
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #61 |
Created attachment 134661
Logs and assorted information with respect to IOMMU issues
I have had this issue with my Asus Z9 PE-D8 WS since I bought it back in 2012.
I have a Marvell 88SE9230 PCIe SATA 6Gb/s Controller and am running Gentoo
Linux AMD64 with kernel 3.10.25 (later versions of the kernel do not agree with
my GTX670 and the screen is blank on boot) and the patch provided in this PR.
Unfortunately, it does not work and I am still getting the same error
as before:
dmar: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [0b:00.o] fault addr 0
DMAR: [fault reason 6] PTE read access is not set
....
ata[6-9] COMRESET Failed (errno=-16)
This error leads to my marvell controller not being initialized and the disks
on it fall out of the software raid 6 I'm running (four of the seven disks were
always active on the intel controller so no data loss occurred at any point
during this).
So I decided to go all out and try several different combinations of options in
the BIOS and kernel options with this patch applied. The options that I tried
different combinations of were:
1) Intel VT-d [BIOS]
2) Address Translation Services [BIOS - Sub option if Intel VT-d is enabled]
3) Coherency Support [BIOS - Sub option if Intel VT-d is enabled]
4) Native Command Queueing [Kernel cmd line, disabled with libata.force=noncq]
NOTE that if VT-d is disabled then I have no issues so I am only showing the
default configuration at the top of the following table. Interpret the binary
digits as having the above features turned on or off [read it from left to
right]:
0001 : Works [Software IOMMU is used]
1111 : Fails [ata6 and ata7 had COMRESET failures]
1110 : Fails [ata7 has COMRESET failures (ata6 is absent)]
1011 : Fails [ata7 and ata8 have COMRESET failures]
1010 : Fails [ata7 and ata8 have COMRESET failures]
1101 : Fails [ata8 and ata7 (order changed) have COMRESET failures, faster]
1100 : Fails [ata9 and ata7 (different device) have COMRESET failures, faster]
1001 : Fails [ata7 and ata8 have COMRESET failures]
1000 : Fails [ata7 and ata8 have COMRESET failures]
I have also attached a file containing the contents of /var/log/messages for
different runs. I have annotated each run in the file so that it can be refered
to some what easily. I have also attached the results of lspci, I will warn you
that it is quite large as I have 150 PCI devices on my bus!
While this may be irrelevant, I noticed that the problematic device (ata14)
actually does exist and is registered in Windows 7 Professional 64-bit as a
"Marvell Console" device which is on port 14. I see references to it not being
identifiable in the annotated runs document I have attached.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #62 |
I've proposed a patch series that should resolve this:
https:/
The patches applied to v3.15-rc3 can also be found here:
git://github.
Please test and provide feedback here or to the list. If your controller is not one of the ones listed in patch 04/13, please add a new entry for it and report it here. Thanks
In Linux Kernel Bug Tracker #42679, bugs (bugs-linux-kernel-bugs) wrote : | #63 |
@Alex Williamson: Your patches seem to work fine for me, with this addition for my controller.
diff --git a/drivers/
index ea55b0f..f0d8b11 100644
--- a/drivers/
+++ b/drivers/
@@ -3366,6 +3366,8 @@ DECLARE_
/* https:/
DECLARE_
+DECLARE_
+ quirk_dma_
/* https:/
DECLARE_
In Linux Kernel Bug Tracker #42679, qemu (qemu-linux-kernel-bugs) wrote : | #64 |
@Alex Williamson: Your Kernel from git is working on Gigabyte GA-X79-UP4 Rev1.1 BIOS F7 in Debian Testing Jessie.
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)
06:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)
07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)
I added DECLARE_
The Harddisk is accessible with vt-d active and its possible to vfio passthru x-vga in qemu 1.7.
Thank you for your Work.
In Linux Kernel Bug Tracker #42679, qemu (qemu-linux-kernel-bugs) wrote : | #65 |
Created attachment 134751
lspcis for Gigabyte GA-X79-UP4 Rev. 1.1
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #66 |
Created attachment 134861
lspci and logs from tycho (asus z9pe-d8 ws, X79 + C600 chipset) running 3.15-rc3
@Alex Williamson: I appreciate the work you've done, unforunately the patches
do not affect the DMAR error I'm getting with the Marvell 88SE9230 even after
I added an entry to drivers/
diff --git a/drivers/
index ea55b0f..bfe9c8d 100644
--- a/drivers/
+++ b/drivers/
@@ -3366,6 +3366,8 @@ DECLARE_
/* https:/
DECLARE_
quirk_
+DECLARE_
+ quirk_dma_
/* https:/
DECLARE_
PCI_
I have attached the output lspci and verbose output from the failed run to this
comment. I hope this helps. Once again, thanks for all the work.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #67 |
(In reply to Joshua Scoggins from comment #49)
> Created attachment 134861 [details]
> lspci and logs from tycho (asus z9pe-d8 ws, X79 + C600 chipset) running
> 3.15-rc3
>
> @Alex Williamson: I appreciate the work you've done, unforunately the patches
> do not affect the DMAR error I'm getting with the Marvell 88SE9230 even after
> I added an entry to drivers/
>
> diff --git a/drivers/
> index ea55b0f..bfe9c8d 100644
> --- a/drivers/
> +++ b/drivers/
> @@ -3366,6 +3366,8 @@ DECLARE_
> 0x9123,
> /* https:/
> DECLARE_
> quirk_dma_
> +DECLARE_
> + quirk_dma_
> /* https:/
> DECLARE_
> PCI_DEVICE_
>
> I have attached the output lspci and verbose output from the failed run to
> this
> comment. I hope this helps. Once again, thanks for all the work.
Hi Joshua,
I don't actually see any IOMMU faults in any of your logs, either the original or updated. Only your update in comment 44 show a DMAR fault. Could you try to record a log that includes the IOMMU faults you're seeing and not just the SATA port probing failure? Thanks
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #68 |
Created attachment 134871
More Verbose DMESG output for tycho (z9pe-d8 WS, x79 + C600)
Sorry about that, I thought I saw the DMAR error in those logs.....
I enabled heavy debug mode this time and confirmed I saw the DMAR errors and have attached that log.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #69 |
(In reply to Joshua Scoggins from comment #51)
> Created attachment 134871 [details]
> More Verbose DMESG output for tycho (z9pe-d8 WS, x79 + C600)
>
> Sorry about that, I thought I saw the DMAR error in those logs.....
> I enabled heavy debug mode this time and confirmed I saw the DMAR errors and
> have attached that log.
Your log doesn't seem to match the issues others are having. There's only a single DMAR fault:
[ 1.887994] dmar: DRHD: handling fault status reg 2
[ 1.888349] dmar: DMAR:[DMA Read] Request device [0b:00.0] fault addr 0
DMAR:[fault reason 06] PTE Read access is not set
This is a read access to physical address 0x0 from function 0. It seems valid for the IOMMU to block this, the driver can possibly have mapped a buffer at 0x0. Later the log shows problems probing these channels:
[ 7.193298] ata8.00: qc timeout (cmd 0xec)
[ 7.194320] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.197283] ata14.00: qc timeout (cmd 0xa1)
[ 7.198266] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.237279] ata7: link is slow to respond, please be patient (ready=0)
[ 7.238300] ata9: link is slow to respond, please be patient (ready=0)
[ 7.513247] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7.517218] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 11.884142] ata9: COMRESET failed (errno=-16)
[ 11.885130] ata7: COMRESET failed (errno=-16)
[ 17.242784] ata9: link is slow to respond, please be patient (ready=0)
[ 17.243720] ata7: link is slow to respond, please be patient (ready=0)
[ 17.510746] ata8.00: qc timeout (cmd 0xec)
[ 17.511561] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 17.512361] ata8: limiting SATA link speed to 1.5 Gbps
[ 17.514750] ata14.00: qc timeout (cmd 0xa1)
[ 17.515498] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 17.516231] ata14: limiting SATA link speed to 1.5 Gbps
[ 17.830691] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
[ 17.834674] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 21.889672] ata7: COMRESET failed (errno=-16)
[ 21.890480] ata9: COMRESET failed (errno=-16)
[ 27.248351] ata9: link is slow to respond, please be patient (ready=0)
[ 27.249142] ata7: link is slow to respond, please be patient (ready=0)
[ 47.823276] ata8.00: qc timeout (cmd 0xec)
[ 47.823988] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 47.827246] ata14.00: qc timeout (cmd 0xa1)
[ 47.827960] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 48.143219] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
[ 48.147218] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 56.921034] ata7: COMRESET failed (errno=-16)
[ 56.921773] ata7: limiting SATA link speed to 1.5 Gbps
[ 56.922548] ata9: COMRESET failed (errno=-16)
[ 56.923265] ata9: limiting SATA link speed to 1.5 Gbps
[ 61.943795] ata9: COMRESET failed (errno=-16)
[ 61.944518] ata9: reset failed, giving up
[ 61.945297] ata7: COMRESET failed (errno=-16)
[ 61.946022] ata7: reset failed, giving up
But, there are no further DMAR faults. Can you confirm whether adding the 0x9230 ID for your devic...
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #70 |
(In reply to Alex Williamson from comment #52)
> (In reply to Joshua Scoggins from comment #51)
> > Created attachment 134871 [details]
> > More Verbose DMESG output for tycho (z9pe-d8 WS, x79 + C600)
> >
> > Sorry about that, I thought I saw the DMAR error in those logs.....
> > I enabled heavy debug mode this time and confirmed I saw the DMAR errors
> and
> > have attached that log.
>
> Your log doesn't seem to match the issues others are having. There's only a
> single DMAR fault:
>
> [ 1.887994] dmar: DRHD: handling fault status reg 2
> [ 1.888349] dmar: DMAR:[DMA Read] Request device [0b:00.0] fault addr 0
> DMAR:[fault reason 06] PTE Read access is not set
>
> This is a read access to physical address 0x0 from function 0. It seems
> valid for the IOMMU to block this, the driver can possibly have mapped a
> buffer at 0x0. Later the log shows problems probing these channels:
>
> [ 7.193298] ata8.00: qc timeout (cmd 0xec)
> [ 7.194320] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 7.197283] ata14.00: qc timeout (cmd 0xa1)
> [ 7.198266] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 7.237279] ata7: link is slow to respond, please be patient (ready=0)
> [ 7.238300] ata9: link is slow to respond, please be patient (ready=0)
> [ 7.513247] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 7.517218] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 11.884142] ata9: COMRESET failed (errno=-16)
> [ 11.885130] ata7: COMRESET failed (errno=-16)
> [ 17.242784] ata9: link is slow to respond, please be patient (ready=0)
> [ 17.243720] ata7: link is slow to respond, please be patient (ready=0)
> [ 17.510746] ata8.00: qc timeout (cmd 0xec)
> [ 17.511561] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 17.512361] ata8: limiting SATA link speed to 1.5 Gbps
> [ 17.514750] ata14.00: qc timeout (cmd 0xa1)
> [ 17.515498] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 17.516231] ata14: limiting SATA link speed to 1.5 Gbps
> [ 17.830691] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
> [ 17.834674] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 21.889672] ata7: COMRESET failed (errno=-16)
> [ 21.890480] ata9: COMRESET failed (errno=-16)
> [ 27.248351] ata9: link is slow to respond, please be patient (ready=0)
> [ 27.249142] ata7: link is slow to respond, please be patient (ready=0)
> [ 47.823276] ata8.00: qc timeout (cmd 0xec)
> [ 47.823988] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 47.827246] ata14.00: qc timeout (cmd 0xa1)
> [ 47.827960] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 48.143219] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
> [ 48.147218] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 56.921034] ata7: COMRESET failed (errno=-16)
> [ 56.921773] ata7: limiting SATA link speed to 1.5 Gbps
> [ 56.922548] ata9: COMRESET failed (errno=-16)
> [ 56.923265] ata9: limiting SATA link speed to 1.5 Gbps
> [ 61.943795] ata9: COMRESET failed (errno=-16)
> [ 61.944518] ata9: reset failed, giving up
> [ 61.945297] ata7: COMRESET fai...
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #71 |
(In reply to Joshua Scoggins from comment #53)
> (In reply to Alex Williamson from comment #52)
> > (In reply to Joshua Scoggins from comment #51)
> > > Created attachment 134871 [details]
> > > More Verbose DMESG output for tycho (z9pe-d8 WS, x79 + C600)
> > >
> > > Sorry about that, I thought I saw the DMAR error in those logs.....
> > > I enabled heavy debug mode this time and confirmed I saw the DMAR errors
> and
> > > have attached that log.
> >
> > Your log doesn't seem to match the issues others are having. There's only
> a
> > single DMAR fault:
> >
> > [ 1.887994] dmar: DRHD: handling fault status reg 2
> > [ 1.888349] dmar: DMAR:[DMA Read] Request device [0b:00.0] fault addr 0
> > DMAR:[fault reason 06] PTE Read access is not set
> >
> > This is a read access to physical address 0x0 from function 0. It seems
> > valid for the IOMMU to block this, the driver can possibly have mapped a
> > buffer at 0x0. Later the log shows problems probing these channels:
Correction, the device attempted to access I/O virtual address 0x0, not physical address 0x0, but I think the conclusion is the same, the hardware is doing a stray DMA access.
[...]
> By adding 0x9230 ID for my device do you mean to quirks.c as others have
> done? If so then I already did that with no change.
Yes, if adding 0x9230 made no change then I don't think this device suffers from the same problem as the other Marvell controllers reported here. The expected failure mode is that during channel probing the device generates several DMAR faults to non-zero addresses where the requests are being generated using function 1 rather than the correct requester ID. When quirked, the IOMMU maps both function 0 and function 1 through the IOMMU, allowing these accesses and the device works.
Your report in comment 44 indicates that the DMAR error from your device was always from the correct function, just to an unmapped address. This suggests the hardware might be using a DMA read as a ways to flush previous transactions, effectively a bus synchronization.
> Adding the iommu=pt option to the kernel command line does fix the dmar
> error but is that all that is necessary? When this option is added do all
> devices get passed through or just those incompatible with the MMU? The
> kernel command line options documentation is sparse on what this does.
>
> And if adding iommu=pt is all I need to do then I appreciate the work (and I
> apologize if I sent you on a wild goose chase trying to fix this) as my
> system boots much faster and feels generally more responsive.
The passthrough option is probably intentionally vague because it depends a little on how the IOMMU driver interprets it. On VT-d, an IOMMU domain is created that identity maps all memory. With the exception of devices that can only do 32bit DMA, all devices will be attached to this domain. This means you lose the isolation capabilities of the IOMMU for most of your host devices, but you can still use the IOMMU for device assignment.
Another possible solution to this problem that would maintain the most usefulness of the IOMMU would be for the driver to map a scratch DMA page at this address for the har...
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #72 |
Created attachment 134951
Comparison of kernel 3.10.25 and 3.15.0 w/ iommu=pt without quirks entries
Well I went back to 3.10.25 thinking that the solution was to put iommu=pt and
everything should be okay but doing that with 3.10.25 causes quite a large
number of DMAR errors to show up during boot which I have a dmesg output log
of.
This got me thinking that perhaps your patches actually fix the issue so I
commented out my quirk entry, recompiled, rebooted, and got the similar (if not
identical) DMAR errors (log attached as well). So it seems that:
1) My marvell 9230 controller is not compatible with the IOMMU and iommu=pt
needs to be added to the kernel command line
2) Your patches do solve the issue after I add an entry for my controller to
drivers/
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #73 |
Ok, I'll keep the quirk for 0x9230, and then there are issues beyond that that we need to tackle since this device is extra broken. Thanks
In Linux Kernel Bug Tracker #42679, qemu (qemu-linux-kernel-bugs) wrote : | #74 |
Created attachment 134991
without and witch patch on Gigabyte GA-X79-UP4 Marvell 88SE9172
Update of Comment 47
@Alex Williamson: Your Kernel from git is working on Gigabyte GA-X79-UP4 Rev1.1 BIOS F7 in Debian Testing Jessie.
diff --git a/drivers/
index ea55b0f..f0d8b11 100644
--- a/drivers/
+++ b/drivers/
@@ -3366,6 +3366,8 @@ DECLARE_
/* https:/
DECLARE_
+DECLARE_
+ quirk_dma_
/* https:/
DECLARE_
The Harddisk is accessible with vt-d active and its possible to vfio passthru x-vga in qemu 1.7.
Attached are the dmesg and lspci Log without and with patch, If you need more information, just send me a message.
Thank you for your Work.
In Linux Kernel Bug Tracker #42679, Theoretically.x64 (theoretically.x64-linux-kernel-bugs) wrote : | #75 |
(In reply to Alex Williamson from comment #56)
> Ok, I'll keep the quirk for 0x9230, and then there are issues beyond that
> that we need to tackle since this device is extra broken. Thanks
This device is very much extra broken. If the disks are under heavy load from something like an MDADM raid recovery then running smartctl on the last disk on the controller will trigger a SMART command failure or a IDENTIFY DEVICE command failure which will cause the drive's link to be reset. Fortunately, I have to poke it to get it to do it. Here is the output from dmesg
[Sat May 3 13:40:38 2014] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Sat May 3 13:40:38 2014] ata8.00: failed command: IDENTIFY DEVICE
[Sat May 3 13:40:38 2014] ata8.00: cmd ec/00:01:
res 40/00:00:
[Sat May 3 13:40:38 2014] ata8.00: status: { DRDY }
[Sat May 3 13:40:38 2014] ata8: hard resetting link
[Sat May 3 13:40:39 2014] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Sat May 3 13:40:39 2014] ata8.00: configured for UDMA/133
[Sat May 3 13:40:39 2014] ata8: EH complete
[Sat May 3 13:41:13 2014] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Sat May 3 13:41:13 2014] ata8.00: failed command: IDENTIFY DEVICE
[Sat May 3 13:41:13 2014] ata8.00: cmd ec/00:01:
res 40/00:00:
[Sat May 3 13:41:13 2014] ata8.00: status: { DRDY }
[Sat May 3 13:41:13 2014] ata8: hard resetting link
[Sat May 3 13:41:14 2014] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Sat May 3 13:41:14 2014] ata8.00: configured for UDMA/133
[Sat May 3 13:41:14 2014] ata8: EH complete
[Sat May 3 13:42:18 2014] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Sat May 3 13:42:18 2014] ata8.00: failed command: SMART
[Sat May 3 13:42:18 2014] ata8.00: cmd b0/d1:01:
res 40/00:00:
[Sat May 3 13:42:18 2014] ata8.00: status: { DRDY }
[Sat May 3 13:42:18 2014] ata8: hard resetting link
[Sat May 3 13:42:19 2014] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Sat May 3 13:42:19 2014] ata8.00: configured for UDMA/133
[Sat May 3 13:42:19 2014] ata8: EH complete
If you need more information then send me a message, as for system stability, I'm going to disable the iommu for the time being.
Once again, I really appreciate all of the work.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #76 |
Found another one right under my nose:
04:00.0 IDE interface [0101]: Marvell Technology Group Ltd. 88SE9172 SATA III 6Gb/s RAID Controller [1b4b:917a] (rev 11)
This is one where function 1 doesn't exist, causing some trouble on AMD-Vi. Will be fixed in v2 of the patches.
In Linux Kernel Bug Tracker #42679, lt-83 (lt-83-linux-kernel-bugs) wrote : | #77 |
I stumbled upon this while attempting to install ESXi 5.5 Update 1 on an older Gigabyte GA-X58A-UD3R (rev. 2.0) motherboard (BIOS Fh1) with Vt-d enabled that has a Marvell 9128 controller with 2 ports. It seems the solution would be to switch controllers and deal with SATA2 speeds, as I intend to try VT-d pass through.
If I do re-purpose this machine towards a Linux install, I will take a look at the patch, however. Here's a picture of the vmkernel log (Alt+F12 when installing): http://
In Linux Kernel Bug Tracker #42679, daxcore (daxcore-linux-kernel-bugs) wrote : | #78 |
Patch is working for me (ASUS P7P55D-E PRO). Thanks!
Will this patch released in a coming kernel tag - in which version?
In Linux Kernel Bug Tracker #42679, yourpadremb (yourpadremb-linux-kernel-bugs) wrote : | #79 |
Using 3.15-rc6 in my Gigabyte z77x-ud5h still have the problem
The current patch does not work anymore
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #80 |
(In reply to yourpadremb from comment #62)
> Using 3.15-rc6 in my Gigabyte z77x-ud5h still have the problem
>
> http://
>
> The current patch does not work anymore
Which current patch doesn't work anymore?
Does this work for you https:/
?
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #81 |
The 'current' patch set is the one posted by Alex to lkml, not the one attached to this bug report. Alex's patch set really is the way forward (even if it's missing some device IDs) and I'm hopeful that it will be picked up by mainline soon.
The patch I attached here will not apply to 3.15, because of other changes in the intel iommu driver. It's easy to fix, but you really should be using Alex's patch set instead.
I saw the pastebin log title is: "linux 3.15-rc6 and Marvell SATA 88SE9128". Please be more specific about the device id (9128) in your bugzilla comments in future.
Alex: the 9128 device ID is missing in v4 of your patch set.
In Linux Kernel Bug Tracker #42679, yourpadremb (yourpadremb-linux-kernel-bugs) wrote : | #82 |
Sorry, I did a mistake. I confuse the name of the Marvell card with the current used here
lspci -nn |grep SATA
00:1f.2 SATA controller [0106]: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1e02] (rev 04)
03:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)
08:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #83 |
(In reply to Andrew Cooks from comment #64)
> The 'current' patch set is the one posted by Alex to lkml, not the one
> attached to this bug report. Alex's patch set really is the way forward
> (even if it's missing some device IDs) and I'm hopeful that it will be
> picked up by mainline soon.
>
> The patch I attached here will not apply to 3.15, because of other changes
> in the intel iommu driver. It's easy to fix, but you really should be using
> Alex's patch set instead.
>
> I saw the pastebin log title is: "linux 3.15-rc6 and Marvell SATA 88SE9128".
> Please be more specific about the device id (9128) in your bugzilla comments
> in future.
>
> Alex: the 9128 device ID is missing in v4 of your patch set.
Can you reference a bug comment that identifies this device ID? I can pick it up if there's another rev of the series, otherwise I'd prefer to add it as a follow-on patch so we don't overload upstream.
In Linux Kernel Bug Tracker #42679, cJ-kernel (cj-kernel-linux-kernel-bugs) wrote : | #84 |
Here with a HighPoint RocketRaid 642L which has the 88SE9235 chip and can be supported by the ahci driver (patch submitted).
Alex, I tried to rebase your dma-alias branch on top of Linus' master, but I can't boot with the quirk when the card is in the computer (I've added an entry for the VID/PID for this card 0x1103/0x0642).
Using AMD 990FX (ASUS Sabertooth 990FX, rev1 I think) and some IOMMU-related kernel command-line parameters: iommu=1 ivrs_ioapic[
Side note 1: Just for fun I went through HighPoint's support since they provide a driver, and even a web GUI which needs to be used to configure the extensive features of the board... which I have never used or never plan to use.
"Dear customer, The driver doesn't support IOMMU. You need to disable it from system BIOS."
Side note 2: With IOMMU disabled, I observed the same kind of issues as Joshua, but not with every device plugged on it though. But that's out of scope for this bug.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #85 |
(In reply to *cJ* from comment #67)
> Here with a HighPoint RocketRaid 642L which has the 88SE9235 chip and can be
> supported by the ahci driver (patch submitted).
>
> Alex, I tried to rebase your dma-alias branch on top of Linus' master, but I
> can't boot with the quirk when the card is in the computer (I've added an
> entry for the VID/PID for this card 0x1103/0x0642).
> Using AMD 990FX (ASUS Sabertooth 990FX, rev1 I think) and some IOMMU-related
> kernel command-line parameters: iommu=1 ivrs_ioapic[
> ivrs_ioapic[
> borkedness.
What errors do you get on stock upstream? How does the problem change with the dma-alias-v4 tag (without additional VID/DID)? With additional VID/DID?
In Linux Kernel Bug Tracker #42679, cJ-kernel (cj-kernel-linux-kernel-bugs) wrote : | #86 |
Can boot with the card on master linux, and get the same AMD-Vi IOMMU PAGE_FAULT errors with the HighPoint official driver (once it's patched to compile...).
I'll check later today with dma-alias-v4.
In Linux Kernel Bug Tracker #42679, cJ-kernel (cj-kernel-linux-kernel-bugs) wrote : | #87 |
(sorry for the delay) dma-alias-v4 is awesome (unlike this hardware)!
I see no more IOMMU issues with the quirk.
Tested-by: Jérôme Carretero <email address hidden>
In Linux Kernel Bug Tracker #42679, r1ch4rd.thompson (r1ch4rd.thompson-linux-kernel-bugs) wrote : | #88 |
Created attachment 144451
Trouble passing through my Highpoint RocketRAID 640L PCIE Storage controller to a domU in my XenCenter home server.
In Linux Kernel Bug Tracker #42679, r1ch4rd.thompson (r1ch4rd.thompson-linux-kernel-bugs) wrote : | #89 |
Hi,
I am having trouble passing through my Highpoint RocketRAID 640L PCIE Storage controller to a domU in my XenCenter home server.
http://
It works fine on bare metal OSes, but seems to give behavior suspiciously similar to this bug, when I attempt to pass it through to a XenServer domU:
https:/
Server hardware: Asus Sabertooth 990FX motherboard with an AMD FX8350 8 core processor and 32GB RAM.
https:/
Could I fix this with this with the "dma-alias-v4" patch?
This does not seem to exist, as far as I can tell: git://github.
Has this patch been committed to the kernel and is the reason I'm having problems that XenServer Uses 3.10?
Any help would be appreciated.
In Linux Kernel Bug Tracker #42679, cJ-kernel (cj-kernel-linux-kernel-bugs) wrote : | #90 |
Rich, I added the PCI ids for the 642L but not 640L, in the regular AHCI driver, and for the DMA quirk. You'd need to add the PCI IDs for the 640L (see commits c2e0fb966ad8ab3
Yes I think the issue would be fixed with the patch (and the added PCI ids).
Note that the patches are in mainline 3.15 now, maybe it's an option for you to upgrade.
BTW, should this bug be marked as resolved?
In Linux Kernel Bug Tracker #42679, r1ch4rd.thompson (r1ch4rd.thompson-linux-kernel-bugs) wrote : | #91 |
Hi.
Don't mark as resolved just yet; this may be a deeper problem in XenServer:
As I said; the attached storage shows up and works just fine in the XenServer dom0 (3.10 kernel), but does not pass through to an Ubuntu Server domU (3.13 kernel). This seems weird to me.
I'll try to upgrade the 3.15 kernel in the domU (ownCloud VM) and see what happens, but XenServer is under heavy, active development ATM (though still based on the *old* CentOS5 build) and is giving me severe problems doing *anything* on its dom0. Also, this card is unsupported on XenServer, so I don't anticipate ant help from them to this end.
If a kernel upgrade to 3.15 (on the doU in question) solves this problem, then it may be possible to put this one to bed.
Thanks for the *very* prompt reply,
Rich.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #92 |
This bug is not yet resolved, the full set of IOMMU changes won't be upstream until 3.17. Unfortunately Rich, XenServer is an entirely different beast and needs a bug filed somewhere else. The kernel changes here are not going to magically fix the Xen hypervisor. Xen folks may choose a similar solution, but that's for them to decide.
In Linux Kernel Bug Tracker #42679, r1ch4rd.thompson (r1ch4rd.thompson-linux-kernel-bugs) wrote : | #93 |
IMHO, Citrix / XenServer developers will not waste their time with one; they have enough on their plate just getting XS up to date and adding long overdue features and improvements, without being asked to deal with *unsupported* hardware etc.
Xen devs, upstream, might be more receptive, but they're having enough trouble ATM in just getting their basic wiki articles fit for use (I'm right now looking to see if I can help them in a big documentation drive this week).
I'll Google around this issue again and see what I can see, from the Xen perspective.
Thanks for your responses,
Rich.
In Linux Kernel Bug Tracker #42679, is (is-linux-kernel-bugs) wrote : | #94 |
I've compiled a RPM for those who are using CentOS 6.5.
http://
http://
http://
Compile with Alex Williamson's dma-alias-v4 source.
In Linux Kernel Bug Tracker #42679, cjtuckerjr (cjtuckerjr-linux-kernel-bugs) wrote : | #95 |
I don't know if the DMAR errors I am receiving have anything to do with this thread. But I have been researching this problem & keeping abreast of the comments here, & just wanted to offer more insight & any support required.
These DMAR errors are occurring on my ASUS X79 Rampage IV Black Edition. It happens for the same reasons stated here (intel_iommu is set) but with a slightly different configuration. It happens with my Plextor SSD running on a Sonnet PCIe Adapter Card:
[ 1.504803] dmar: DRHD: handling fault status reg 2
[ 1.504806] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 1.844208] dmar: DRHD: handling fault status reg 102
[ 1.844488] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 1.997325] dmar: DRHD: handling fault status reg 202
[ 1.997327] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 7.003663] dmar: DRHD: handling fault status reg 302
[ 7.003949] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 7.322245] dmar: DRHD: handling fault status reg 402
[ 7.322515] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 7.475646] dmar: DRHD: handling fault status reg 502
[ 7.475917] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 12.481676] dmar: DRHD: handling fault status reg 602
[ 12.481962] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 12.800264] dmar: DRHD: handling fault status reg 702
[ 12.800535] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 12.953665] dmar: DRHD: handling fault status reg 2
[ 12.953935] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 17.959657] dmar: DRHD: handling fault status reg 102
[ 17.959944] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
[ 18.278287] dmar: DRHD: handling fault status reg 202
[ 18.278558] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr fffe0000
DMAR:[fault reason 02] Present bit in context entry is clear
There is no 0b:00.1 device. However, when I query pci devices for "0b", I get the following:
root [ ~ ]# lspci | grep -i 0b
0b:00.0 SATA controller: Marvell Technology Group Ltd. Device 9182 (rev 11)
ff:0b.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 UBOX Registers (rev 04)
ff:0b.3 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 UBOX Registers (rev 04)
I read somewhere that just removing iommu from the config, things should work fine. And, so t...
In Linux Kernel Bug Tracker #42679, cjtuckerjr (cjtuckerjr-linux-kernel-bugs) wrote : | #96 |
(In reply to CJ from comment #78)
> I don't know if the DMAR errors I am receiving have anything to do with this
> thread. But I have been researching this problem & keeping abreast of the
> comments here, & just wanted to offer more insight & any support required.
>
> These DMAR errors are occurring on my ASUS X79 Rampage IV Black Edition. It
> happens for the same reasons stated here (intel_iommu is set) but with a
> slightly different configuration. It happens with my Plextor SSD running on
> a Sonnet PCIe Adapter Card:
>
>
> [ 1.504803] dmar: DRHD: handling fault status reg 2
> [ 1.504806] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 1.844208] dmar: DRHD: handling fault status reg 102
> [ 1.844488] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 1.997325] dmar: DRHD: handling fault status reg 202
> [ 1.997327] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 7.003663] dmar: DRHD: handling fault status reg 302
> [ 7.003949] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 7.322245] dmar: DRHD: handling fault status reg 402
> [ 7.322515] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 7.475646] dmar: DRHD: handling fault status reg 502
> [ 7.475917] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 12.481676] dmar: DRHD: handling fault status reg 602
> [ 12.481962] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 12.800264] dmar: DRHD: handling fault status reg 702
> [ 12.800535] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 12.953665] dmar: DRHD: handling fault status reg 2
> [ 12.953935] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 17.959657] dmar: DRHD: handling fault status reg 102
> [ 17.959944] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
> [ 18.278287] dmar: DRHD: handling fault status reg 202
> [ 18.278558] dmar: DMAR:[DMA Write] Request device [0b:00.1] fault addr
> fffe0000
> DMAR:[fault reason 02] Present bit in context entry is clear
>
>
> There is no 0b:00.1 device. However, when I query pci devices for "0b", I
> get the following:
>
> root [ ~ ]# lspci | grep -i 0b
> 0b:00.0 SATA controller: Marvell Technology Group Ltd. Device 9182 (rev 11)
> ff:0b.0 System peripheral: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7
> UBOX Registers (rev 04)
> ff:0b.3 System peripheral: Intel...
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #97 |
CJ, what kernel are you adding your quirk too? If it's not a kernel patched with the v4 patch set or the current pre-3.17-rc1 tree, then the IOMMU code is not yet in place to make use of the quirk.
In Linux Kernel Bug Tracker #42679, cjtuckerjr (cjtuckerjr-linux-kernel-bugs) wrote : | #98 |
(In reply to Alex Williamson from comment #80)
> CJ, what kernel are you adding your quirk too? If it's not a kernel patched
> with the v4 patch set or the current pre-3.17-rc1 tree, then the IOMMU code
> is not yet in place to make use of the quirk.
Hi Alex,
I am using v3.16.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #99 |
Then the quirk doesn't actually make any difference yet. Use the current development tree or wait for 3.17-rc1.
In Linux Kernel Bug Tracker #42679, cjtuckerjr (cjtuckerjr-linux-kernel-bugs) wrote : | #100 |
(In reply to Alex Williamson from comment #82)
> Then the quirk doesn't actually make any difference yet. Use the current
> development tree or wait for 3.17-rc1.
Got it! Thank you! Much appreciation!!
In Linux Kernel Bug Tracker #42679, is (is-linux-kernel-bugs) wrote : | #101 |
This is weird.
/var/log/messages is flooded with:
Aug 9 16:42:26 Hypervisor kernel: dmar: DRHD: handling fault status reg 3
Aug 9 16:42:26 Hypervisor kernel: dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 100000000
Aug 9 16:42:26 Hypervisor kernel: DMAR:[fault reason 06] PTE Read access is not set
02:00.0 is a PCI (not PCIe) Intel NIC
# lspci
02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)
# lspci -n
02:00.0 0200: 8086:107c (rev 05)
Any chances that this is related?
In Linux Kernel Bug Tracker #42679, cjtuckerjr (cjtuckerjr-linux-kernel-bugs) wrote : | #102 |
(In reply to Jerry Chen from comment #84)
> This is weird.
>
> /var/log/messages is flooded with:
> Aug 9 16:42:26 Hypervisor kernel: dmar: DRHD: handling fault status reg 3
> Aug 9 16:42:26 Hypervisor kernel: dmar: DMAR:[DMA Read] Request device
> [02:00.0] fault addr 100000000
> Aug 9 16:42:26 Hypervisor kernel: DMAR:[fault reason 06] PTE Read access is
> not set
>
> 02:00.0 is a PCI (not PCIe) Intel NIC
>
> # lspci
> 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
> Controller (rev 05)
>
> # lspci -n
> 02:00.0 0200: 8086:107c (rev 05)
>
> Any chances that this is related?
Well, something happened between v3.12.26 & v3.16. Referring to "Comment 79" where these errors were produced on v3.16 with intel_iommu set as default (INTEL_
In Linux Kernel Bug Tracker #42679, is (is-linux-kernel-bugs) wrote : | #103 |
(In reply to CJ from comment #85)
> (In reply to Jerry Chen from comment #84)
> > This is weird.
> >
> > /var/log/messages is flooded with:
> > Aug 9 16:42:26 Hypervisor kernel: dmar: DRHD: handling fault status reg 3
> > Aug 9 16:42:26 Hypervisor kernel: dmar: DMAR:[DMA Read] Request device
> > [02:00.0] fault addr 100000000
> > Aug 9 16:42:26 Hypervisor kernel: DMAR:[fault reason 06] PTE Read access
> is
> > not set
> >
> > 02:00.0 is a PCI (not PCIe) Intel NIC
> >
> > # lspci
> > 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
> > Controller (rev 05)
> >
> > # lspci -n
> > 02:00.0 0200: 8086:107c (rev 05)
> >
> > Any chances that this is related?
>
> Well, something happened between v3.12.26 & v3.16. Referring to "Comment
> 79" where these errors were produced on v3.16 with intel_iommu set as
> default (INTEL_
> default set. I know this will be fixed in 3.17. But, it looks like
> v3.12.26 somehow got looked over as it relates to this problem. This
> version, IMO, appears to be really solid.
You are right, 3.12.16 seems to be the perfect kernel on this issue.
In Linux Kernel Bug Tracker #42679, is (is-linux-kernel-bugs) wrote : | #104 |
(In reply to CJ from comment #85)
> (In reply to Jerry Chen from comment #84)
> > This is weird.
> >
> > /var/log/messages is flooded with:
> > Aug 9 16:42:26 Hypervisor kernel: dmar: DRHD: handling fault status reg 3
> > Aug 9 16:42:26 Hypervisor kernel: dmar: DMAR:[DMA Read] Request device
> > [02:00.0] fault addr 100000000
> > Aug 9 16:42:26 Hypervisor kernel: DMAR:[fault reason 06] PTE Read access
> is
> > not set
> >
> > 02:00.0 is a PCI (not PCIe) Intel NIC
> >
> > # lspci
> > 02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
> > Controller (rev 05)
> >
> > # lspci -n
> > 02:00.0 0200: 8086:107c (rev 05)
> >
> > Any chances that this is related?
>
> Well, something happened between v3.12.26 & v3.16. Referring to "Comment
> 79" where these errors were produced on v3.16 with intel_iommu set as
> default (INTEL_
> default set. I know this will be fixed in 3.17. But, it looks like
> v3.12.26 somehow got looked over as it relates to this problem. This
> version, IMO, appears to be really solid.
Can you share your .config for 3.12.26? It seems like I messed up something.
In Linux Kernel Bug Tracker #42679, acooks (acooks-linux-kernel-bugs) wrote : | #105 |
Jerry,
DMAR:[fault reason 06] is not the same as
DMAR:[fault reason 02] and the device does not use a Marvell chipset.
Therefore it is very unlikely that the problem you are experiencing is related.
Please file a separate bug report using the instructions in the 'REPORTING-BUGS' file in your kernel source tree.
In Linux Kernel Bug Tracker #42679, javid.kayvan+kernel (javid.kayvan+kernel-linux-kernel-bugs) wrote : | #106 |
Marvell 88SE923 chip here, still problems using Ubuntu 3.17 mainline kernel:
http://
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #107 |
(In reply to Kayvan Javid from comment #89)
> Marvell 88SE923 chip here, still problems using Ubuntu 3.17 mainline kernel:
> http://
Do we have your device ID included?
https:/
In Linux Kernel Bug Tracker #42679, javid.kayvan+kernel (javid.kayvan+kernel-linux-kernel-bugs) wrote : | #108 |
lspci -nn
01:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 11)
It is in the quirks.c cgit link line 3479:
/* https:/
DECLARE_
quirk_
Still getting output as per comment 58.
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #109 |
(In reply to Kayvan Javid from comment #91)
> lspci -nn
> 01:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe
> SATA 6Gb/s Controller [1b4b:9230] (rev 11)
>
> It is in the quirks.c cgit link line 3479:
> /* https:/
> DECLARE_
> quirk_dma_
>
> Still getting output as per comment 58.
Comment 58 isn't even a DMA fault. Please file a new bug, we can't track every broken aspect of Marvell controllers in a single bug.
In Linux Kernel Bug Tracker #42679, javid.kayvan+kernel (javid.kayvan+kernel-linux-kernel-bugs) wrote : | #110 |
You are quite right, I am *not* seeing any DMA problems with the Marvell 9230.
I misread the bug, and a lot of the dmesg output people have posted include the same ATA errors I am seeing.
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #111 |
Add the Marvell 9183 to the list:
with intel_iommu=on:
dmesg |grep dmar
[ 0.026475] dmar: Host address width 39
[ 0.026476] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.026482] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.026483] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.026487] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.026487] dmar: RMRR base: 0x000000d8ce3000 end: 0x000000d8ceffff
[ 0.026488] dmar: RMRR base: 0x000000db000000 end: 0x000000df1fffff
[ 0.719857] dmar: DRHD: handling fault status reg 2
[ 0.719878] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 1.034557] dmar: DRHD: handling fault status reg 3
[ 1.034576] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.042308] dmar: DRHD: handling fault status reg 2
[ 6.042339] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000 [ 0.026475] dmar: Host address width 39
[ 0.026476] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.026482] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.026483] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.026487] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.026487] dmar: RMRR base: 0x000000d8ce3000 end: 0x000000d8ceffff
[ 0.026488] dmar: RMRR base: 0x000000db000000 end: 0x000000df1fffff
[ 0.719857] dmar: DRHD: handling fault status reg 2
[ 0.719878] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 1.034557] dmar: DRHD: handling fault status reg 3
[ 1.034576] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.042308] dmar: DRHD: handling fault status reg 2
[ 6.042339] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.350034] dmar: DRHD: handling fault status reg 3
[ 6.350053] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.357836] dmar: DRHD: handling fault status reg 2
[ 11.357864] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.665512] dmar: DRHD: handling fault status reg 3
[ 11.665532] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 16.673282] dmar: DRHD: handling fault status reg 2
[ 16.673311] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.350034] dmar: DRHD: handling fault status reg 3
[ 6.350053] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.357836] dmar: DRHD: handling fault status reg 2
[ 11.357864] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.665512] dmar: DRHD: handling fault status reg 3
[ 11.665532] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 16.673282] dmar: DRHD: handling fault status reg 2
[ 16.673311] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
lspci -nnv
02:00.0 SATA controller [0106]: Device [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device [1b4b:9183]
Note, for google: this is the controller emb...
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #112 |
(In reply to Elliott from comment #94)
> Add the Marvell 9183 to the list:
>
> with intel_iommu=on:
>
> dmesg |grep dmar
...
> [ 0.719878] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr
> fffe0000
> [ 1.034557] dmar: DRHD: handling fault status reg 3
...
>
> lspci -nnv
> 02:00.0 SATA controller [0106]: Device [1c28:0122] (rev 14) (prog-if 01
> [AHCI 1.0])
> Subsystem: Marvell Technology Group Ltd. Device [1b4b:9183]
>
> Note, for google: this is the controller embedded in the Plextor m6e M.2 SSD
FWIW, the PCI vendor ID is actually Lite-on. Can you confirm this patch resolves the problem:
--- a/drivers/
+++ b/drivers/
@@ -3484,6 +3484,8 @@ DECLARE_
DECLARE_
+/* https:/
+DECLARE_
/*
* A few PCIe-to-PCI bridges fail to expose a PCIe capability, resulting in
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #113 |
(In reply to Alex Williamson from comment #95)
No luck. I tried both the patch you posted, and I also tried:
DECLARE_
Just for good measure. In both cases, the same errors as posted in c94 when intel_iommu=on.
Let me know if there's anything else I can do to try and help debug.
Thanks,
Elliott
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #114 |
(In reply to Elliott from comment #96)
> (In reply to Alex Williamson from comment #95)
> No luck. I tried both the patch you posted, and I also tried:
Do any of these kernel boot options make a difference:
pci=nomsi
intremap=off
intremap=nosid
Given your fault address, I expect they might all work. The last option is the least invasive.
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #115 |
Created attachment 154931
dmesg intremap=nosid
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #116 |
Created attachment 154941
dmesg intremap=off
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #117 |
Created attachment 154951
dmesg intel_iommu=off
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #118 |
Created attachment 154961
dmesg intel_iommu=on
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #119 |
Created attachment 154971
dmesg pci=nomsi
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #120 |
I tried with all of the kernel options you recommended, and also included dmesg with intel_iommu=on and intel_iommu=off for comparison. Regardless of the other kernel options, the SSD is not visible when intel_iommu=on
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #121 |
Created attachment 155441
dmesg linux 3.17
Arch just pushed the 3.17 kernel, bug is still present. Would you like me to re-run the intremap and nomsi kernel options?
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #122 |
Created attachment 156481
attachment-
I can't see any complaint about present bit being cleared in comment 94.
There are
likely entries for both function 0 and 1. It seems like you have another
problem...
Did you use the controller to boot the kernel? I noticed issues when using
the Marvell
controller as boot device. My best guess is that the BIOS assigned memory
to the
controller that it is still accessing. Problem is that the kernel wasn't
informed about it.
Could your problem be the same?
2014-10-27 16:22 GMT+01:00 <email address hidden>:
> https:/
>
> --- Comment #104 from Elliott <email address hidden> ---
> Created attachment 155441
> --> https:/
> dmesg linux 3.17
>
> Arch just pushed the 3.17 kernel, bug is still present. Would you like me
> to
> re-run the intremap and nomsi kernel options?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #123 |
Accidentally created an attachment. Can't seem to find any way to remove it.
Sorry about that... Please feel free to remove it if possible.
In Linux Kernel Bug Tracker #42679, elliott (elliott-linux-kernel-bugs) wrote : | #124 |
My kernel lives on another disk drive. /dev/sda1 is my EFI system partition, /dev/sda2 is the MSR, /dev/sda3 is NTFS Windows 7, /dev/sda4 is my / partition. Marvell controller is the SSD on /dev/sdb. I don't know what you mean by "preset bit" (sorry, I'm not so fluent in C).
I'm using the SSD with an embedded Marvell controller as a caching device (enhanceio when I posted to this bug, but I just switched to bcache) for a slower hard drive. I did briefly consider enhanceio might be the problem, so I disabled it completely to test. This didn't make a difference; with intel_iommu, the kernel throws the dmar errors, and I can't access /dev/sdb.
In Linux Kernel Bug Tracker #42679, lk (lk-linux-kernel-bugs) wrote : | #125 |
The quirk installs entries for both function numbers. If function 1 would have been unknown, you would have seen warnings about presence bit not set (see comment 78 as example). The lack of those messages indicates that you successfully installed entries for both function 0 and 1, hence that the patch is working.
You can still run into problems if the chip tries to read/write memory that isn't allocated by the driver module. The problems I saw was related to the controller being initiated and used by the BIOS during boot. It tried to read memory that didn't belong to it (as fas as the linux kernel was concerned). The controller stopped working when the DMA read failed (blocked by the iommu).
It is not necessarily an error that the controller is assigned memory during boot. Although these memory regions must be presented to the operating system. This is where the vt-d support seems to fail on many consumer boards.
In Linux Kernel Bug Tracker #42679, frollic (frollic-linux-kernel-bugs) wrote : | #126 |
Is there any progress ?
I'm hitting this error on Fedora 3.17.8-200.fc20 kernel, which makes my system pretty much unusable :(
07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])
DeviceName: Marvell 9230 AHCI controller
Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230]
Flags: bus master, fast devsel, latency 0, IRQ 92
I/O ports at b050 [size=8]
I/O ports at b040 [size=4]
I/O ports at b030 [size=8]
I/O ports at b020 [size=4]
I/O ports at b000 [size=32]
Memory at 90610000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at 90600000 [disabled] [size=64K]
Kernel driver in use: ahci
Motherboard is Supermicro X10SBA - http://
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #127 |
(In reply to frollic from comment #109)
> Is there any progress ?
>
> I'm hitting this error on Fedora 3.17.8-200.fc20 kernel, which makes my
> system pretty much unusable :(
>
> 07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe
> SATA 6Gb/s Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])
It should have been fixed in v3.16 by cc346a4714 for this device. Are you sure you're seeing the same error? What are the symptoms?
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #128 |
Actually, refreshing my memory in the comments here, others are also reporting that issues for 1b4b:9230 persist, but they're different than the problem we're trying to fix here and suggest either broken hardware or broken driver (or both). As suggested previously, if you're not getting DMAR faults, file a new bug.
In Linux Kernel Bug Tracker #42679, frollic (frollic-linux-kernel-bugs) wrote : | #129 |
Indeed, I don't have DMAR errors in my syslog.
Drives are 3 * WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133 running
soft-RAID5.
One SAMSUNG SSD SM841 mSATA 128GB, DXM43D0Q, max UDMA/133 in a mSAT->SATA case/converter.
Feb 4 19:09:43 atlantis kernel: [ 464.228813] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: [ 464.231988] ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: [ 464.235233] ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: [ 464.238596] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.242000] ata3.00: cmd 60/00:70:
Feb 4 19:09:43 atlantis kernel: [ 464.242000] res 50/00:00:
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:70:
res 50/00:00:
Feb 4 19:09:43 atlantis kernel: [ 464.248733] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.252192] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.255558] ata3.00: cmd 60/00:78:
Feb 4 19:09:43 atlantis kernel: [ 464.255558] res 50/00:00:
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:78:
res 50/00:00:
Feb 4 19:09:43 atlantis kernel: [ 464.262523] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.272877] ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: [ 464.276284] ata3: hard resetting link
Feb 4 19:09:43 atlantis kernel: ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: ata3: hard resetting link
Feb 4 19:09:44 atlantis kernel: [ 464.586712] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: [ 464.593370] ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: [ 464.596855] ata3: EH complete
Feb 4 19:09:44 atlantis kernel: ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: ata3: EH complete
Feb 4 19:10:03 atlantis kernel: [ 484.234979] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:10:03 atlantis kernel: [ 484.238484] ata3.00: exception Emask 0x1 SAct 0xc000000 SErr 0x0 action 0x0
Feb 4 19:10:03 atlantis kernel: [ 484.242039] ata3.00: irq_stat 0x40000008
Fe...
In Linux Kernel Bug Tracker #42679, frollic (frollic-linux-kernel-bugs) wrote : | #130 |
In addition, mobo is brand new (doesn't mean it can't be faulty), WDC drives are 2 months old (installed just before X-mas last year). The SSD was purchased used, so I can't tell you how old that is.
All of the hardware, except for the Samsung SSD, ran just fine on my Supermicro X7SPA-H, before I swapped mobo just two days ago.
In Linux Kernel Bug Tracker #42679, kernel (kernel-linux-kernel-bugs) wrote : | #131 |
(In reply to Alex Williamson from comment #95)
I encountered same problem on PX-G128M6e (Plextor M6e series SSD) and resolved it by the patch.
(actually, I used the 4.0.5 kernel patched with the code described in https:/
Booting with the ssd and passthrough the ssd to a guest OS both work correctly.
My system is Asus H97M-PLUS with Bios 2501 and PX-G128M6e with firmware revision 1.06.
The kernel .config is Arch's linux 4.0.5-1 package.
In Linux Kernel Bug Tracker #42679, kernel (kernel-linux-kernel-bugs) wrote : | #132 |
Created attachment 179951
dmesg of 4.0.5 vanilla kernel with iommu=on
`grep -i -e dmar -e iommu` is below
[ 0.000000] Command line: BOOT_IMAGE=
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=
[ 0.000000] Intel-IOMMU: enabled
[ 0.107086] dmar: Host address width 39
[ 0.107098] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107123] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107138] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107154] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107169] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107179] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107191] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.685402] DMAR: No ATSR found
[ 0.685642] IOMMU: dmar0 using Queued invalidation
[ 0.685651] IOMMU: dmar1 using Queued invalidation
[ 0.685662] IOMMU: Setting RMRR:
[ 0.685694] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.686154] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686215] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686268] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686308] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.686329] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 0.847930] dmar: DRHD: handling fault status reg 2
[ 0.848264] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 1.161006] dmar: DRHD: handling fault status reg 3
[ 1.161963] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.159656] dmar: DRHD: handling fault status reg 2
[ 6.160750] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 6.472980] dmar: DRHD: handling fault status reg 3
[ 6.473513] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.471329] dmar: DRHD: handling fault status reg 2
[ 11.471661] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 11.784476] dmar: DRHD: handling fault status reg 3
[ 11.785472] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
[ 16.783038] dmar: DRHD: handling fault status reg 2
[ 16.783646] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
In Linux Kernel Bug Tracker #42679, kernel (kernel-linux-kernel-bugs) wrote : | #133 |
Created attachment 179961
dmesg of 4.0.5 patched kernel with iommu=on
`grep -i -e dmar -e iommu` is below
[ 0.000000] Command line: BOOT_IMAGE=
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=
[ 0.000000] Intel-IOMMU: enabled
[ 0.107025] dmar: Host address width 39
[ 0.107037] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107060] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107075] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107092] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107107] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107117] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107129] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.688999] DMAR: No ATSR found
[ 0.689240] IOMMU: dmar0 using Queued invalidation
[ 0.689249] IOMMU: dmar1 using Queued invalidation
[ 0.689259] IOMMU: Setting RMRR:
[ 0.689292] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.689754] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689816] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689868] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689908] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.689930] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 66.222474] [drm] DMAR active, disabling use of stolen memory
In Linux Kernel Bug Tracker #42679, kernel (kernel-linux-kernel-bugs) wrote : | #134 |
`lscpi -nnvv`
02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device [1b4b:9183]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 30
Region 0: I/O ports at e050 [size=8]
Region 1: I/O ports at e040 [size=4]
Region 2: I/O ports at e030 [size=8]
Region 3: I/O ports at e020 [size=4]
Region 4: I/O ports at e000 [size=32]
Region 5: Memory at f7c20000 (32-bit, non-prefetchable) [size=512]
Expansion ROM at f7c00000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00378 Data: 0000
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCo
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationCom
Equalizatio
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout+ NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: ahci
Kernel modules: ahci
----
`lscpi -nnvv` on the host with passthrough the ssd to a guest OS
02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device [1b...
In Linux Kernel Bug Tracker #42679, tasos (tasos-linux-kernel-bugs) wrote : | #135 |
I believe I am affected by the same bug with the Marvell 88SE9120 controller on an ASRock 990FX Extreme 4 motherboard.
Although there are no DMAR errors in dmesg, when AMD's IOMMU is enabled in the bios I get the following a couple of times, before it gives up
[ 117.616423] ata9: hard resetting link
[ 117.632972] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=
[ 117.632982] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=
[ 118.340472] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=
[ 122.616621] ata9: softreset failed (1st FIS failed)
[ 122.616632] ata9: reset failed, giving up
[ 122.616640] ata9: EH complete
Once the controller's dev ID was added to drivers/
[ 1520.100391] ata9: hard resetting link
[ 1526.038156] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 330)
[ 1526.044554] ata9.00: ATA-7: SAMSUNG HD502IJ, 1AA01112, max UDMA7
[ 1526.044559] ata9.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1526.050996] ata9.00: configured for UDMA/133
[ 1526.051007] ata9: EH complete
And here is the patch
--- a/drivers/
+++ b/drivers/
@@ -3589,6 +3589,8 @@ DECLARE_
/* https:/
DECLARE_
quirk_
+DECLARE_
+ quirk_dma_
DECLARE_
quirk_
/* https:/
Could this device id be added to the list of affected devices?
In Linux Kernel Bug Tracker #42679, alex.williamson (alex.williamson-linux-kernel-bugs) wrote : | #136 |
(In reply to Tasos Sahanidis from comment #118)
>
> Could this device id be added to the list of affected devices?
It's already queued in the pull request for v4.2:
In Linux Kernel Bug Tracker #42679, tasos (tasos-linux-kernel-bugs) wrote : | #137 |
(In reply to Alex Williamson from comment #119)
> It's already queued in the pull request for v4.2:
>
> http://
> pci/quirks.
Apologies for that, did not see it.
Thank you for your time!
In Linux Kernel Bug Tracker #42679, bill.hudacek (bill.hudacek-linux-kernel-bugs) wrote : | #138 |
Hi. Old Newbie to kernel things here. I see from Alex's (initial?) patch at https:/
However, exploring at https:/
So - I'm probably looking in all the wrong places.
I've just set up Fedora 22 4.1.3-200.
ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.00: failed command: WRITE DMA
ata10.00: cmd ca/00:01:
res 40/00:00:
ata10.00: status: { DRDY }
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.00: qc timeout (cmd 0xec)
ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata10.00: revalidation failed (errno=-5)
ata10: hard resetting link
ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata11.00: failed command: READ DMA EXT
ata11.00: cmd 25/00:10:
res 40/00:00:
ata11.00: status: { DRDY }
ata11: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata11: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata11: COMRESET failed (errno=-16)
ata11: hard resetting link
This is a StarTech PEXSAT31E1 add-on so it's not booting the system. It's connected to a external cabinet, and I'm using mdadm for RAID-5. All drives report the same issues (logging not included here) which is what had me looking at the controller.
I am really hoping it's not included yet - which would both explain the issue and the fact that 'the fix is in'.
I've not built a kernel since - well, a long time ago - Ubuntu 6.10 or so. Now I might get a chance to try it on Fedora.
Please let me know if it would help if I provided more info. Sure looks like I'm just like most others here...
Can anyone Help?
Many Thanks :-)
/Bill
In Linux Kernel Bug Tracker #42679, bill.hudacek (bill.hudacek-linux-kernel-bugs) wrote : | #139 |
*bump*
I'm down here. I'm contemplating getting a 3ware and going the hardware route. I've had pretty horrid experience with Highpoint support (non-existent) and the Marvell controllers seem to be dysfunctional. Vendor who sold me the card could not provide any drivers or firmware updates, so this is my only possible path to a solution using this type of controller - the kernel patch(es).
Thanks.
In Linux Kernel Bug Tracker #42679, frollic (frollic-linux-kernel-bugs) wrote : | #140 |
For the 9230 you might want to check the updated BIOS we've discussed at:
http://
In Linux Kernel Bug Tracker #42679, oh-itsme (oh-itsme-linux-kernel-bugs) wrote : | #141 |
(In reply to frollic from comment #123)
> For the 9230 you might want to check the updated BIOS we've discussed at:
> http://
> updates-and-such/
I had found that thread in a websearch as I have encountered similar issues as you had, also using a Supermicro X10SBA. I had contacted Supermicro about this, but support did not really seem to be aware of this issue, and no update for the controller was sent to me. The thread you refer to does not state the outcome of applying the firmware to the X10SBA, does it solve the issue?
In Linux Kernel Bug Tracker #42679, frollic (frollic-linux-kernel-bugs) wrote : | #142 |
(In reply to oh-itsme from comment #124)
> I had found that thread in a websearch as I have encountered similar issues
> as you had, also using a Supermicro X10SBA. I had contacted Supermicro about
> this, but support did not really seem to be aware of this issue, and no
> update for the controller was sent to me.
I was in touch with the dutch support of Supermicro, they were very helpful, it took them about 10 days to obtain the update from Marvell.
The person I was in contact with wrote that the update would be posted along with the next BIOS update for the motherboard, but I don't think it actually happened :(
> The thread you refer to does not state the outcome of applying the firmware
> to the X10SBA, does it solve the issue?
Yes it helpmed me, the soft-RAID is running fine now, even though I get occasional mismatch_cnt is not 0 on /dev/mdXXX when running raid-check.
In Linux Kernel Bug Tracker #42679, tasos (tasos-linux-kernel-bugs) wrote : | #143 |
There seems to have been a regression sometime after the 4.3 tag (6a13feb9c82803
This results in the drives attached to the controller becoming inaccessible.
Please note that this time the quirk for my device is present in drivers/
In Linux Kernel Bug Tracker #42679, kevosev23194 (kevosev23194-linux-kernel-bugs) wrote : | #144 |
Hi There
Just want to address a problem with Asrock Extreme 9 X79 with BIOS P4.00 platform and its Marvell 88SE9220 controller.
I expecience the same faults as the above DMAR faults when this controller is enabled.
However the problem appears to be resolved by adding a new entry in quirks.c
DECLARE_
Let me know if you need me to attach any logs of faults, at the moment I'm using a custom compiled kernel with the above fix on Arch Linux but can switch to a standard kernel.
Kind Regards,
In Linux Kernel Bug Tracker #42679, alan (alan-linux-kernel-bugs) wrote : | #145 |
If you've got the quirk fix and done the testing then I would see Documentation/
Send it to <email address hidden> and it should get reviewed and merged
Alan
In Linux Kernel Bug Tracker #42679, microsoftenator (microsoftenator-linux-kernel-bugs) wrote : | #146 |
I can confirm that this issue occurs with the Marvell 88SE9128 controller on my Gigabyte GA-X59A-UD7 (rev2.0) motherboard. As with Kevin Hunt above, adding a new entry in quirks.c appears to resolve the issue.
Given the name of this bug, I was surprised that the 9128 wasn't in there.
In Linux Kernel Bug Tracker #42679, microsoftenator (microsoftenator-linux-kernel-bugs) wrote : | #147 |
Addendum to the above:
The 9128 *does* appear to be in quirks file for mainline, but not in the kernel provided by Arch Linux (4.15.15). It seems that was either added in 4.16 or Arch's patches removed it for some reason.
In Linux Kernel Bug Tracker #42679, bhelgaas (bhelgaas-linux-kernel-bugs) wrote : | #148 |
http://
http://
Are there any devices that are still broken in v4.17-rc1? If not, maybe we can close this bug?
In Linux Kernel Bug Tracker #42679, k8wtaylnuuz7 (k8wtaylnuuz7-linux-kernel-bugs) wrote : | #149 |
(In reply to Bjorn Helgaas from comment #131)
> http://
> ?id=aa0082066343 for Marvell 9128 appeared in v4.16-rc1.
>
> http://
> ?id=832e4e1f76b8 for Marvell 88SE9220 appeared in v4.17-rc1.
>
> Are there any devices that are still broken in v4.17-rc1? If not, maybe we
> can close this bug?
I still have this issue with a Marvell 88SE9230 and kernel v4.16.8 under Arch Linux. It's probably worth checking all their SATA Controllers before closing this bug: https:/
In Linux Kernel Bug Tracker #42679, bhelgaas (bhelgaas-linux-kernel-bugs) wrote : | #150 |
v4.16 already contains a quirk for the Marvell 88SE9230 (added by cc346a4714a5 ("PCI: Add function 1 DMA alias quirk for Marvell devices") way back in v3.16).
But from comment #44 and comments #49-#58, it sounds like the 9230 has other problems in addition to this one, so I suspect you're seeing those other problems. If so, can you open a new bug for that and copy Joshua and Alex? I took a quick look and didn't see a definitive resolution for the problems Joshua reported.
I'm going to close this one and if people see more problems that are resolved by quirk_dma_
In Linux Kernel Bug Tracker #42679, f.bluethner (f.bluethner-linux-kernel-bugs) wrote : | #151 |
I have this issue with "Marvell Technology Group Ltd. 88SS9183 PCIe SSD Controller" in my "Asus Rog Strix Z370-F Gaming" and solved it by adding "DECLARE_
quirk_dma_
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #1 |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1810239
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Steven Ellis (steven-openmedia) wrote : Re: amd_iommu conflict with Marvell Sata controller | #2 |
root@mythfe-amd:~# lspci -knnv -s 01:00.0
01:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230]
Flags: bus master, fast devsel, latency 0, IRQ 56
I/O ports at f050 [size=8]
I/O ports at f040 [size=4]
I/O ports at f030 [size=8]
I/O ports at f020 [size=4]
I/O ports at f000 [size=32]
Memory at f7d10000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at f7d00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [e0] SATA HBA v0.0
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci
Kernel modules: ahci
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
tags: | added: cosmic |
Steven Ellis (steven-openmedia) wrote : AlsaInfo.txt | #3 |
tags: | added: apport-collected bionic |
description: | updated |
Steven Ellis (steven-openmedia) wrote : CRDA.txt | #4 |
Steven Ellis (steven-openmedia) wrote : CurrentDmesg.txt | #5 |
Steven Ellis (steven-openmedia) wrote : Lspci.txt | #6 |
Steven Ellis (steven-openmedia) wrote : Lsusb.txt | #7 |
Steven Ellis (steven-openmedia) wrote : ProcCpuinfo.txt | #8 |
Steven Ellis (steven-openmedia) wrote : ProcCpuinfoMinimal.txt | #9 |
Steven Ellis (steven-openmedia) wrote : ProcEnviron.txt | #10 |
Steven Ellis (steven-openmedia) wrote : ProcInterrupts.txt | #11 |
Steven Ellis (steven-openmedia) wrote : ProcModules.txt | #12 |
Steven Ellis (steven-openmedia) wrote : PulseList.txt | #13 |
Steven Ellis (steven-openmedia) wrote : UdevDb.txt | #14 |
Steven Ellis (steven-openmedia) wrote : WifiSyslog.txt | #15 |
Kai-Heng Feng (kaihengfeng) wrote : Re: amd_iommu conflict with Marvell Sata controller | #16 |
Would it be possible for you to test the latest upstream kernel? Refer
to https:/
v4.20 kernel[0].
If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag:
'kernel-
Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".
Thanks in advance.
Changed in linux: | |
importance: | Unknown → Medium |
status: | Unknown → Fix Released |
Steven Ellis (steven-openmedia) wrote : Re: amd_iommu conflict with Marvell Sata controller | #152 |
Looks like there is a new upstream issue with
- https:/
Steven Ellis (steven-openmedia) wrote : | #153 |
I attempted a boot with the following upstream kernel packages
linux-
linux-
On boot I see the following errors
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xef)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to set xfermode (err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: limiting SATA link speed to 1.5 Gbps
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: disabled
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata10: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata13: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata14: SATA link down (SStatus 0 SControl 330)
Steven Ellis (steven-openmedia) wrote : | #154 |
Rebooted with the 4.20.0-
Jan 02 22:10:52 mythfe-amd kernel: ata8.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata8.00: configured for UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: ATA-8: ST3500418AS, CC46, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: scsi 1:0:0:0: Direct-Access ATA ST3250820AS E PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: Attached scsi generic sg0 type 0
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/233 GiB)
Jan 02 22:10:52 mythfe-amd kernel: scsi 3:0:0:0: Direct-Access ATA ST3500418AS CC46 PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] Write Protect is off
Changed in linux (Debian): | |
status: | Unknown → New |
description: | updated |
tags: | added: kernel-bug-exists-upstream-4.20 latest-bios-f2 |
summary: |
- amd_iommu conflict with Marvell Sata controller + amd_iommu conflict with Marvell 88SE9230 SATA Controller |
penalvch (penalvch) wrote : | #155 |
Steven Ellis, for you personally:
1) Did this problem not occur in a prior Ubuntu or kernel release, and if so which?
2) If this issue has always occured, could you please advise to the earliest kernel you tested?
3) To keep this relevant to upstream, one will want to test the latest mainline kernel as it is released (now 5.0-rc2). Could you please advise?
Changed in linux (Ubuntu): | |
importance: | Undecided → Low |
Steven Ellis (steven-openmedia) wrote : | #156 |
I've only recently traced the issue to the iommu kernel option. This device has been unstable since I bought it and I pull it out occasionally to see if the driver issues have been addressed.
I'm afraid that the test system I'm using is currently unavailable. I'll post an update when I have a chance for fresh testing.
piktogramm (piktogramm) wrote : | #157 |
Hi,
I had similar Problems with my Marvell 88EE9230. I was able to improve the situation quite a lot by updating the firmware of the controller itself. In General all firmware versions beyond version 2.3.xxx improved the situation quite a lot. The remaining problem is, that I get failures on ata6 which is the only port which is not connected to any drive at all. Any drive connected to the marvell controller itself is perfectly stable (24/7 for +400 days).
Source for Firmwares: https:/
May 05 03:16:10 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x6 frozen
May 05 03:16:11 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 03:16:11 doomsdaydevice kernel: ata6.00: cmd 61/10:08:
May 05 03:16:11 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 03:16:11 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 03:16:11 doomsdaydevice kernel: ata6.00: cmd 61/10:10:
May 05 03:16:11 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 03:16:11 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 03:16:11 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:01:33 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x60 SErr 0x0 action 0x6 frozen
May 05 06:01:33 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:01:33 doomsdaydevice kernel: ata6.00: cmd 61/08:28:
May 05 06:01:33 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:01:33 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:01:33 doomsdaydevice kernel: ata6.00: cmd 61/08:30:
May 05 06:01:33 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:01:33 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:01:33 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:37:03 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x30 SErr 0x0 action 0x6 frozen
May 05 06:37:03 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:37:03 doomsdaydevice kernel: ata6.00: cmd 61/08:20:
May 05 06:37:03 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:37:03 doomsdaydevice kernel: ata6.00: fail...
penalvch (penalvch) wrote : | #158 |
Johannes (piktogrammdd+
ubuntu-bug linux
Please feel free to subscribe me to it.
tags: |
added: bios-outdated-f.40 removed: latest-bios-f2 |
tags: | added: needs-upstream-testing |
piktogramm (piktogramm) wrote : | #159 |
Christoper, I filed the bug. Anyway I made a mistake. I took the output from lshw where scsi@6 was not populated and I took for granted, that ata6 equals scsi@6 which isn't the case. Therefore I get the mentioned errors on my boot drive.
https:/
In Linux Kernel Bug Tracker #42679, LK7S2ED64JHGLKj75shg9klejHWG49h5hk (lk7s2ed64jhglkj75shg9klejhwg49h5hk-linux-kernel-bugs) wrote : | #160 |
"Marvell Technology Group Ltd. 88SS9215 PCIe SSD Controller" have the same bug.
Fixed by:
DECLARE_
quirk_
Changed in linux (Debian): | |
status: | New → Fix Released |
In Linux Kernel Bug Tracker #42679, sam (sam-linux-kernel-bugs) wrote : | #161 |
Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller [1b4b:9125]" - fixed with:
DECLARE_
quirk_
Is this sufficient or should I open a new bug?
In Linux Kernel Bug Tracker #42679, alan (alan-linux-kernel-bugs) wrote : | #162 |
Even better would be to make a git diff of it and then submit it with explanation to
<email address hidden> and cc <email address hidden>
See:
https:/
In Linux Kernel Bug Tracker #42679, biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote : | #163 |
(In reply to sbingner from comment #136)
> Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller
> [1b4b:9125]" - fixed with:
>
> DECLARE_
> quirk_dma_
>
> Is this sufficient or should I open a new bug?
I have the same hardware and was able to test and confirm the bug. I just submitted the patch to the Linux kernel maintainers. Hopefully it'll be accepted soon.
https:/
In Linux Kernel Bug Tracker #42679, biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote : | #164 |
(In reply to Tom Li from comment #138)
> (In reply to sbingner from comment #136)
> > Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller
> > [...]
> > Is this sufficient or should I open a new bug?
>
> I have the same hardware and was able to test and confirm the bug. I just
> submitted the patch to the Linux kernel maintainers. Hopefully it'll be
> accepted soon.
>
> https:/
Patch for 88SE9125 has been merged into the upstream kernel since Linux v5.17-rc1.
Greg K.H. has also queued this patch for Linux 4.4, 4.9, 4.14, 5.4, 5.10, 5.15, 5.16. The patch should appear in the next stable kernel release in each branch.
In Linux Kernel Bug Tracker #42679, biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote : | #165 |
(In reply to Tom Li from comment #139)
> Patch for 88SE9125 has been merged into the upstream kernel since Linux
> v5.17-rc1.
>
> https:/
> ?id=e4453758828
>
> Greg K.H. has also queued this patch for Linux 4.4, 4.9, 4.14, 5.4, 5.10,
> 5.15, 5.16. The patch should appear in the next stable kernel release in
> each branch.
My patch has just been included in Linux 4.4.300, 4.9.298, 4.14.263, 4.19.226, 5.4.174, 5.10.94, 5.15.17, and 5.16.3.
Created attachment 72217
Output of `dmesg' command
I have a MSI Z68A-GD80 B3 motherboard and when I try to enable Intel's IOMMU (kernel booted with intel_iommu=on), integrated Marvell 88SE9128 SATA controller doesn't work.
To reproduce: INTEL_IOMMU= y).
1. Compile and prepare kernel with Intel IOMMU support enabled (CONFIG_
2. Reboot the computer.
3. Enter BIOS and enable VT-d.
4. Boot the kernel with intel_iommu=on parameter.
Right after boot, kernel reports the following errors (SATA controller is at 0b:00.0):
[ 2.639774] DRHD: handling fault status reg 3
[ 2.639782] DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
[ 2.639783] DMAR:[fault reason 02] Present bit in context entry is clear
After a while these entries appear:
[ 7.625837] ata14.00: qc timeout (cmd 0xa1)
[ 7.628341] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.935483] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 17.908407] ata14.00: qc timeout (cmd 0xa1)
[ 17.910935] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 17.912276] ata14: limiting SATA link speed to 1.5 Gbps
[ 18.219077] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 48.134607] ata14.00: qc timeout (cmd 0xa1)
[ 48.137508] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 48.444646] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
When there is a disk connected to the controller it does not work. When there are none, computer starts normally, apart from the huge lag caused by, presumably, probing the device.
Since this is the secondary controller on these motherboards, to eliminate those symptoms you can just plug disk in one of available ports of the built-in Intel SATA controller and disable Marvell's one using BIOS. The other work-around, if you need to use eSATA capabilities of the latter, is to disable VT-d techonology also using BIOS.