Charm doesn't install necessary DPDK drivers for hinic (Huawei) NICs

Bug #1936850 reported by Vladimir Grevtsev
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-layer-ovn
Fix Released
Wishlist
Unassigned
charm-ovn-chassis
Fix Released
Wishlist
Unassigned
charm-ovn-dedicated-chassis
Fix Released
Wishlist
Unassigned

Bug Description

[Expected result]

Given the below environment and config, the DPDK bond will be initialized successfully.

[Actual result]

ovs-vsctl show has errors:

    Bridge br-dpdk
        fail_mode: standalone
        datapath_type: netdev
        Port dpdk-bond0
            Interface dpdk-769d67d
                type: dpdk
                options: {dpdk-devargs="0000:3e:00.0"}
                error: "Error attaching device '0000:3e:00.0' to DPDK"
            Interface dpdk-18f5dde
                type: dpdk
                options: {dpdk-devargs="0000:40:00.0"}
                error: "Error attaching device '0000:40:00.0' to DPDK"
        Port br-dpdk
            Interface br-dpdk
                type: internal

ovs-vswitchd.log:

2021-07-19T16:24:49.359Z|00062|dpdk|ERR|EAL: Driver cannot attach the device (0000:3e:00.0)
2021-07-19T16:24:49.359Z|00063|dpdk|ERR|EAL: Failed to attach device on primary process
2021-07-19T16:24:49.359Z|00064|netdev_dpdk|WARN|Error attaching device '0000:3e:00.0' to DPDK
2021-07-19T16:24:49.359Z|00065|netdev|WARN|dpdk-769d67d: could not set configuration (Invalid argument)
2021-07-19T16:24:49.359Z|00066|dpdk|ERR|Invalid port_id=32
2021-07-19T16:24:49.379Z|00067|dpdk|ERR|EAL: Driver cannot attach the device (0000:40:00.0)
2021-07-19T16:24:49.379Z|00068|dpdk|ERR|EAL: Failed to attach device on primary process
2021-07-19T16:24:49.379Z|00069|netdev_dpdk|WARN|Error attaching device '0000:40:00.0' to DPDK
2021-07-19T16:24:49.379Z|00070|netdev|WARN|dpdk-18f5dde: could not set configuration (Invalid argument)
2021-07-19T16:24:49.379Z|00071|dpdk|ERR|Invalid port_id=32
2021-07-19T16:24:49.379Z|00072|bridge|INFO|bridge br-dpdk: added interface br-dpdk on port 65534
2021-07-19T16:24:49.379Z|00073|bridge|INFO|bridge br-int: using datapath ID 00002ea26c9ff64d

== Environment
focal/ussuri, hardware: Huawei CH121 V5 blade servers

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-77-generic root=/dev/mapper/vg0-sda--os ro console=tty0 console=ttyS0,115200n8 intel_iommu=on iommu=pt hugepagesz=1G hugepages=1480 default_hugepagesz=1G transparent_hugepage=never isolcpus=0-21,24-45,48-69,72-93

# grep HugePages_ /proc/meminfo
HugePages_Total: 1480
HugePages_Free: 1472
HugePages_Rsvd: 0
HugePages_Surp: 0

# lspci | grep Ethernet
1b:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE backplane (rev 09)
1b:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE backplane (rev 09)
3d:00.0 Ethernet controller: Huawei Technologies Co., Ltd. Hi1822 Family (4*25GE) (rev 45)
3e:00.0 Ethernet controller: Huawei Technologies Co., Ltd. Hi1822 Family (4*25GE) (rev 45)
3f:00.0 Ethernet controller: Huawei Technologies Co., Ltd. Hi1822 Family (4*25GE) (rev 45)
40:00.0 Ethernet controller: Huawei Technologies Co., Ltd. Hi1822 Family (4*25GE) (rev 45)

# sudo dmesg | grep -e IOMMU
[ 5.744179] DMAR: IOMMU enabled
[ 12.605696] DMAR-IR: IOAPIC id 12 under DRHD base 0xc5ffc000 IOMMU 6
[ 12.612122] DMAR-IR: IOAPIC id 11 under DRHD base 0xb87fc000 IOMMU 5
[ 12.618549] DMAR-IR: IOAPIC id 10 under DRHD base 0xaaffc000 IOMMU 4
[ 12.624977] DMAR-IR: IOAPIC id 18 under DRHD base 0xfbffc000 IOMMU 3
[ 12.631404] DMAR-IR: IOAPIC id 17 under DRHD base 0xee7fc000 IOMMU 2
[ 12.637830] DMAR-IR: IOAPIC id 16 under DRHD base 0xe0ffc000 IOMMU 1
[ 12.644259] DMAR-IR: IOAPIC id 15 under DRHD base 0xd37fc000 IOMMU 0
[ 12.650687] DMAR-IR: IOAPIC id 8 under DRHD base 0x9d7fc000 IOMMU 7
[ 12.657027] DMAR-IR: IOAPIC id 9 under DRHD base 0x9d7fc000 IOMMU 7

"dpdk-devbind.py -s" : https://paste.ubuntu.com/p/g8TqRHkyHJ/
ovs-vsctl show: https://paste.ubuntu.com/p/fYWZ8hzvyq/
ovs-vsctl list Open_Vswitch: https://paste.ubuntu.com/p/gHKxdfxgv3/
overlay used for ovn-chassis deployment: https://pastebin.canonical.com/p/BwTKr3HWvd/
ovs-vswitchd.log: https://paste.ubuntu.com/p/nb9nTgwgK8/
extract from lspci -nnv: https://paste.ubuntu.com/p/NxKytM4m3M/

For the record, we've tried changing dpdk-driver from "vfio-pci" to "uio_pci_generic", but that didn't help.

Please, advise, what could be our next steps in troubleshooting this?

Tags: field-high
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Adding field-critical as there's no clear/known workaround yet and this issue affects an ongoing project.

description: updated
tags: added: field-critical
Revision history for this message
Nobuto Murata (nobuto) wrote :

I didn't read the all of the info, but my gut feeling is that our charm only assumes Intel NICs for DPDK. But this time the NIC is by Huawei.

librte-pmd-hinic won't be pulled by the dependency by default so I'm curious if installing librte-pmd-hinic by hand makes any difference.

$ apt-cache depends dpdk | egrep 'i40e|hinic'
  Recommends: librte-pmd-i40e20.0
  Suggests: librte-pmd-hinic20.0
  Suggests: librte-pmd-i40e20.0

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :
Download full text (4.0 KiB)

After I've installed librte-pmd-hinic20.0 by hand, DPDK bond has been assembled:

    Bridge br-dpdk
        fail_mode: standalone
        datapath_type: netdev
        Port dpdk-bond0
            Interface dpdk-769d67d
                type: dpdk
                options: {dpdk-devargs="0000:3e:00.0"}
            Interface dpdk-18f5dde
                type: dpdk
                options: {dpdk-devargs="0000:40:00.0"}
        Port br-dpdk
            Interface br-dpdk
                type: internal

2021-07-19T17:23:18.989Z|00101|dpdk|INFO|Device with port_id=0 already stopped
2021-07-19T17:23:19.075Z|00102|dpdk|INFO|net_hinic: Disable vlan filter succeed, device: hinic-0000:3e:00.0, port_id: 0
2021-07-19T17:23:19.076Z|00103|dpdk|INFO|net_hinic: Disable vlan strip succeed, device: hinic-0000:3e:00.0, port_id: 0
2021-07-19T17:23:19.082Z|00104|dpdk|INFO|net_hinic: Set port mtu, port_id: 0, mtu: 1500, max_pkt_len: 1518
2021-07-19T17:23:19.309Z|00105|dpdk|INFO|net_hinic: Set new mac address f4:a4:d6:f3:68:a2
2021-07-19T17:23:19.309Z|00106|dpdk|INFO|net_hinic: Disable promiscuous, nic_dev: hinic-0000:3e:00.0, port_id: 0, promisc: 0
2021-07-19T17:23:19.311Z|00107|dpdk|INFO|net_hinic: Disable allmulticast succeed, nic_dev: hinic-0000:3e:00.0, port_id: 0
2021-07-19T17:23:19.314Z|00108|dpdk|INFO|net_hinic: Enable promiscuous, nic_dev: hinic-0000:3e:00.0, port_id: 0, promisc: 0
2021-07-19T17:23:19.316Z|00109|dpdk|INFO|net_hinic: Enable allmulticast succeed, nic_dev: hinic-0000:3e:00.0, port_id: 0
2021-07-19T17:23:19.316Z|00110|netdev_dpdk|INFO|Port 0: f4:a4:d6:f3:68:a2
2021-07-19T17:23:19.318Z|00111|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'dpdk-769d67d' rx queue 0 (measured processing cycles
 0).
2021-07-19T17:23:19.320Z|00112|bridge|INFO|bridge br-dpdk: added interface dpdk-769d67d on port 1
2021-07-19T17:23:19.338Z|00113|dpdk|INFO|Device with port_id=1 already stopped
2021-07-19T17:23:19.342Z|00114|dpdk|INFO|net_hinic: Disable vlan filter succeed, device: hinic-0000:40:00.0, port_id: 1
2021-07-19T17:23:19.343Z|00115|dpdk|INFO|net_hinic: Disable vlan strip succeed, device: hinic-0000:40:00.0, port_id: 1
2021-07-19T17:23:19.349Z|00116|dpdk|INFO|net_hinic: Set port mtu, port_id: 1, mtu: 1500, max_pkt_len: 1518
2021-07-19T17:23:19.578Z|00117|dpdk|INFO|net_hinic: Set new mac address f4:a4:d6:f3:68:a4
2021-07-19T17:23:19.578Z|00118|dpdk|INFO|net_hinic: Disable promiscuous, nic_dev: hinic-0000:40:00.0, port_id: 1, promisc: 0
2021-07-19T17:23:19.581Z|00119|dpdk|INFO|net_hinic: Disable allmulticast succeed, nic_dev: hinic-0000:40:00.0, port_id: 1
2021-07-19T17:23:19.584Z|00120|dpdk|INFO|net_hinic: Enable promiscuous, nic_dev: hinic-0000:40:00.0, port_id: 1, promisc: 0
2021-07-19T17:23:19.586Z|00121|dpdk|INFO|net_hinic: Enable allmulticast succeed, nic_dev: hinic-0000:40:00.0, port_id: 1
2021-07-19T17:23:19.586Z|00122|netdev_dpdk|INFO|Port 1: f4:a4:d6:f3:68:a4
2021-07-19T17:23:19.588Z|00123|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'dpdk-769d67d' rx queue 0 (measured processing cycles
 0).
2021-07-19T17:23:19.588Z|00124|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'dpdk-18f5dde' rx queue 0 (measured processing cycles
 0).
2021-...

Read more...

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Workaround: juju run --application ovn-chassis-dpdk "sudo apt install librte-pmd-hinic20.0 -y; sudo service ovs-vswitchd restart"

Downgrading to field-high since w/a has been found.

summary: - Port cannot be added to DPDK-enabled bridge: "Error attaching device to
- DPDK": Invalid port_id=32
+ Charm doesn't install necessary DPDK drivers for hinic (Huawei) NICs
tags: added: field-high
removed: field-critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI,
I agree to Nobuto that only a small set of PMDs (what can be considered the more common and more tested ones) are recommended and thereby installed by default.
  https://git.launchpad.net/ubuntu/+source/dpdk/tree/debian/control?h=applied/ubuntu/focal-devel

All the rest is only a suggests and users of that HW need to install it as already suggested in the comments above.

As a side note, due to a lack of testability those non-recommended PMDs also are in universe.
I guess we'd not make a huge difference if a bug comes up, but since there was not much testing (none on Ubuntu and only very few Upstream AFAICS) for those there was no base to promote them yet.

Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

Looking at the charm there's a dpdk-driver config option (defined in layer-ovn's config.yaml, also see DPDKDeviceContext in charm-helpers where that gets processed), but there doesn't appear to be a hard-coded package or config for specifying the driver library package. There are a lot of dpdk drivers [1] and corresponding libraries defined in the dpdk package, so perhaps we want to handle this with a config option similar to dpdk-driver.

[1] https://doc.dpdk.org/guides/nics/index.html

e.g. dpdk-driver-library=librte-pmd-hinic20.0

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm not entirely sure how best to triage this. For now I'm going to triage this as a wish list based on my comment in #6, but certainly open to other input.

Changed in charm-ovn-chassis:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Billy Olsen (billy-olsen) wrote :

I think a better option would be to have a mapping of NIC to dpdk driver libraries that should be installed. The charm can introspect which cards are installed and select the appropriate dpdk drivers based on that. This of course will require various maintenance over time and the dpdk-driver-library would allow some additional flexibility, but it would just work.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Upon further inspection, I think that Corey's suggestion in #6 is probably the most flexible option. Not all of the dpdk drivers will depend on the specific card that's installed, and its possible that the user will want to use a different dpdk driver.

Revision history for this message
Nobuto Murata (nobuto) wrote :

fwiw, there is a new meta package pulling all of the available PMDs for hirsute+.

$ rmadison librte-meta-allpmds -a amd64
 librte-meta-allpmds | 20.11.1-1 | hirsute/universe | amd64
 librte-meta-allpmds | 20.11.3-0ubuntu0.21.04.2 | hirsute-updates/universe | amd64
 librte-meta-allpmds | 20.11.3-0ubuntu1 | impish/universe | amd64

Revision history for this message
Billy Olsen (billy-olsen) wrote :
Changed in charm-layer-ovn:
status: New → Triaged
importance: Undecided → Wishlist
Changed in charm-layer-ovn:
status: Triaged → In Progress
Revision history for this message
James Page (james-page) wrote :

I've added my approval for the PR proposed to layer-ovn and requested that @fnordhal have a look as well.

Revision history for this message
James Page (james-page) wrote :

PR merged into the OVN base layer - charms will need a rebuild to pick this up.

Changed in charm-layer-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/829556

Changed in charm-ovn-chassis:
status: Triaged → In Progress
James Page (james-page)
Changed in charm-ovn-dedicated-chassis:
status: New → In Progress
importance: Undecided → Wishlist
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-dedicated-chassis (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-dedicated-chassis (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-dedicated-chassis/+/829557
Committed: https://opendev.org/x/charm-ovn-dedicated-chassis/commit/ce189c45d552545faf85085ad8f47842e4fb1fd6
Submitter: "Zuul (22348)"
Branch: master

commit ce189c45d552545faf85085ad8f47842e4fb1fd6
Author: James Page <email address hidden>
Date: Wed Feb 16 14:59:00 2022 +0000

    Rebuild to pickup DPDK driver improvements

    Rebuild charm to pickup new feature in layer-ovn to support
    installation of additional DPDK network drivers for less
    well supports cards.

    Closes-Bug: 1936850
    Change-Id: I61c9b0b1b49b1604e070876ceed32965da193454

Changed in charm-ovn-dedicated-chassis:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ovn-chassis (master)

Change abandoned by "James Page <email address hidden>" on branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/829556

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Fix was released with another rebuild of charm

Changed in charm-ovn-chassis:
status: In Progress → Fix Released
Changed in charm-ovn-dedicated-chassis:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.