Currnetly I have two various hardware configuration:
- HPE ProLiant m710p Server Cartridge (have no this problem)
- HPE ProLiant m710x Server Cartridge (have this problem)
> Did this issue start happening after an update/upgrade?
> Was there a prior kernel version where you were not having this particular problem?
Well, I uses debootstrap script for install all needed software automatically and build image with base system.
After that I uses this image for boot my nodes via PXE. So each boot I have system that installed from scratch.
I've tested the next kernels:
- Ubuntu 16.04 with stock kernel: 4.4.0-116-generic
- Ubuntu 16.04 with hwe kernel: 4.13.0-36-generic
- Ubuntu 16.04 with pve kernel: 4.13.13-6-pve
- Debian 9 with pve kernel: 4.13.13-6-pve
- Debian 9 with stock kernel: 4.9.0-6-amd64
All of them have this problem, but stock kernels can drops after some time.
(I had no this error only on debian with 4.9.0-6-amd64 but presume it exists there because I'm not tested it properly)
Another thing, that if I do this steps AFTER the system is boot up:
Hi Joseph, thanks for your answer.
Currnetly I have two various hardware configuration:
- HPE ProLiant m710p Server Cartridge (have no this problem)
- HPE ProLiant m710x Server Cartridge (have this problem)
> Did this issue start happening after an update/upgrade?
> Was there a prior kernel version where you were not having this particular problem?
Well, I uses debootstrap script for install all needed software automatically and build image with base system.
After that I uses this image for boot my nodes via PXE. So each boot I have system that installed from scratch.
I've tested the next kernels:
- Ubuntu 16.04 with stock kernel: 4.4.0-116-generic
- Ubuntu 16.04 with hwe kernel: 4.13.0-36-generic
- Ubuntu 16.04 with pve kernel: 4.13.13-6-pve
- Debian 9 with pve kernel: 4.13.13-6-pve
- Debian 9 with stock kernel: 4.9.0-6-amd64
All of them have this problem, but stock kernels can drops after some time.
(I had no this error only on debian with 4.9.0-6-amd64 but presume it exists there because I'm not tested it properly)
Another thing, that if I do this steps AFTER the system is boot up:
rmmod mlx4_en mlx4_ib mlx4_core
modprobe mlx4_core num_vfs=1 port_type_array=2,2 probe_vf=1
systemctl restart networking
Everything starts working fine.
> Please test the latest v4.16 kernel[0].
Ok, I'll do this