------- Comment From <email address hidden> 2015-03-11 19:15 EDT------- This looks fixed with 3.19.0-8-generic #8-Ubuntu it was able to recover from EEH.
[ 2694.622586] EEH: Notify device drivers to shutdown [ 2694.622587] mlx4_core 0004:01:00.0: device was reset successfully [ 2694.622589] mlx4_core 0004:01:00.0: mlx4_pci_err_detected was called [ 2694.622594] mlx4_en 0004:01:00.0: Internal error detected, restarting device [ 2694.622786] mlx4_en: eth14: Close port called [ 2694.846830] mlx4_en 0004:01:00.0: removed PHC [ 2694.874036] EEH: Collect temporary log [ 2694.879101] EEH: of node=/pciex@3fffe42000000/pci@0/ethernet@0 [ 2694.879465] EEH: PCI device/vendor: 100715b3 [ 2694.879478] EEH: PCI cmd/status register: 00100142 [ 2694.879479] EEH: PCI-E capabilities and status follow: [ 2694.879544] EEH: PCI-E 00: 00020010 10008e02 0020204e 0843f483 [ 2694.879597] EEH: PCI-E 10: 10830040 00000000 00000000 00000000 [ 2694.879598] EEH: PCI-E 20: 00000000 [ 2694.879599] EEH: PCI-E AER capability register set follows: [ 2694.879666] EEH: PCI-E AER 00: 18c20001 00000000 00000000 00062010 [ 2694.879719] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000 [ 2694.879772] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 2694.879785] EEH: PCI-E AER 30: 00000000 00000000 [ 2694.879787] PHB3 PHB#4 Diag-data (Version: 1) [ 2694.879789] brdgCtl: 00000002 [ 2694.879790] UtlSts: 00200000 00000000 00000000 [ 2694.879791] RootSts: 00000040 00400000 f0830048 00100147 00000000 [ 2694.879792] PhbSts: 0000001c00000000 0000001c00000000 [ 2694.879793] Lem: 0000000000100000 42498e327f502eae 0000000000000000 [ 2694.879795] InAErr: 8000000000000000 8000000000000000 0402008000000000 0000000000000000 [ 2694.879796] PE[ 1] A/B: 8480002b00000000 8000000000000000 [ 2694.879797] PE[ 2] A/B: 8000000000000000 8000000000000000 [ 2694.879798] PE[ 3] A/B: 8000000000000000 8000000000000000 [ 2694.879799] PE[ 4] A/B: 8000000000000000 8000000000000000 [ 2694.879800] PE[ 5] A/B: 8000000000000000 8000000000000000 [ 2694.879801] EEH: Reset without hotplug activity [ 2698.898176] EEH: Notify device drivers the completion of reset [ 2698.898181] mlx4_core 0004:01:00.0: mlx4_pci_slot_reset was called [ 2698.898218] mlx4_core 0004:01:00.0: enabling device (0140 -> 0142) [ 2705.396286] mlx4_core 0004:01:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s [ 2705.396288] mlx4_core 0004:01:00.0: PCIe link width is x8, device supports x8 [ 2706.143789] mlx4_en 0004:01:00.0: registered PHC clock [ 2706.143864] mlx4_en 0004:01:00.0: Activating port:1 [ 2706.159496] mlx4_en: eth11: Using 256 TX rings [ 2706.159504] mlx4_en: eth11: Using 8 RX rings [ 2706.159506] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536 [ 2706.159722] mlx4_en: eth11: Initializing port [ 2706.160022] mlx4_en 0004:01:00.0: Activating port:2 [ 2706.165214] mlx4_core 0004:01:00.0 eth14: renamed from eth11 [ 2706.188419] mlx4_en: eth11: Using 256 TX rings [ 2706.188427] mlx4_en: eth11: Using 8 RX rings [ 2706.188430] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536 [ 2706.188660] mlx4_en: eth11: Initializing port [ 2706.197316] EEH: Notify device driver to resume [ 2706.525987] mlx4_core 0004:01:00.0 eth16: renamed from eth11 [ 2707.487156] mlx4_en: eth14: Link Up [ 2707.542052] mlx4_en: eth16: Link Up
thanks.
------- Comment From <email address hidden> 2015-03-11 19:15 EDT-------
This looks fixed with 3.19.0-8-generic #8-Ubuntu
it was able to recover from EEH.
[ 2694.622586] EEH: Notify device drivers to shutdown err_detected was called 3fffe42000000/ pci@0/ethernet@ 0
[ 2694.622587] mlx4_core 0004:01:00.0: device was reset successfully
[ 2694.622589] mlx4_core 0004:01:00.0: mlx4_pci_
[ 2694.622594] mlx4_en 0004:01:00.0: Internal error detected, restarting device
[ 2694.622786] mlx4_en: eth14: Close port called
[ 2694.846830] mlx4_en 0004:01:00.0: removed PHC
[ 2694.874036] EEH: Collect temporary log
[ 2694.879101] EEH: of node=/pciex@
[ 2694.879465] EEH: PCI device/vendor: 100715b3
[ 2694.879478] EEH: PCI cmd/status register: 00100142
[ 2694.879479] EEH: PCI-E capabilities and status follow:
[ 2694.879544] EEH: PCI-E 00: 00020010 10008e02 0020204e 0843f483
[ 2694.879597] EEH: PCI-E 10: 10830040 00000000 00000000 00000000
[ 2694.879598] EEH: PCI-E 20: 00000000
[ 2694.879599] EEH: PCI-E AER capability register set follows:
[ 2694.879666] EEH: PCI-E AER 00: 18c20001 00000000 00000000 00062010
[ 2694.879719] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ 2694.879772] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 2694.879785] EEH: PCI-E AER 30: 00000000 00000000
[ 2694.879787] PHB3 PHB#4 Diag-data (Version: 1)
[ 2694.879789] brdgCtl: 00000002
[ 2694.879790] UtlSts: 00200000 00000000 00000000
[ 2694.879791] RootSts: 00000040 00400000 f0830048 00100147 00000000
[ 2694.879792] PhbSts: 0000001c00000000 0000001c00000000
[ 2694.879793] Lem: 0000000000100000 42498e327f502eae 0000000000000000
[ 2694.879795] InAErr: 8000000000000000 8000000000000000 0402008000000000 0000000000000000
[ 2694.879796] PE[ 1] A/B: 8480002b00000000 8000000000000000
[ 2694.879797] PE[ 2] A/B: 8000000000000000 8000000000000000
[ 2694.879798] PE[ 3] A/B: 8000000000000000 8000000000000000
[ 2694.879799] PE[ 4] A/B: 8000000000000000 8000000000000000
[ 2694.879800] PE[ 5] A/B: 8000000000000000 8000000000000000
[ 2694.879801] EEH: Reset without hotplug activity
[ 2698.898176] EEH: Notify device drivers the completion of reset
[ 2698.898181] mlx4_core 0004:01:00.0: mlx4_pci_slot_reset was called
[ 2698.898218] mlx4_core 0004:01:00.0: enabling device (0140 -> 0142)
[ 2705.396286] mlx4_core 0004:01:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
[ 2705.396288] mlx4_core 0004:01:00.0: PCIe link width is x8, device supports x8
[ 2706.143789] mlx4_en 0004:01:00.0: registered PHC clock
[ 2706.143864] mlx4_en 0004:01:00.0: Activating port:1
[ 2706.159496] mlx4_en: eth11: Using 256 TX rings
[ 2706.159504] mlx4_en: eth11: Using 8 RX rings
[ 2706.159506] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536
[ 2706.159722] mlx4_en: eth11: Initializing port
[ 2706.160022] mlx4_en 0004:01:00.0: Activating port:2
[ 2706.165214] mlx4_core 0004:01:00.0 eth14: renamed from eth11
[ 2706.188419] mlx4_en: eth11: Using 256 TX rings
[ 2706.188427] mlx4_en: eth11: Using 8 RX rings
[ 2706.188430] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536
[ 2706.188660] mlx4_en: eth11: Initializing port
[ 2706.197316] EEH: Notify device driver to resume
[ 2706.525987] mlx4_core 0004:01:00.0 eth16: renamed from eth11
[ 2707.487156] mlx4_en: eth14: Link Up
[ 2707.542052] mlx4_en: eth16: Link Up
thanks.