Ethernet connection - damaged packets

Bug #1129251 reported by Jakub Budiský
50
This bug affects 9 people
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

I have serious trouble with Ethernet connection on fresh new install of Raring. Basically, after some time, in scale of minutes to days, it just stops responding. Not messages about disconnect at all, but due to ifconfig the RX packets become damaged (counters dropped, overruns and frame become filled).

I know, this type of "random" errors are hard to debug, but I would do almost anything in my capabilities to help to solve this really annoying bug.

To sum it up:

* I have laptop with Atheros Communications Inc. AR8151 v2.0 Gigabit Ethernet (rev c0) and atl1c driver module loaded and associated with it
* Using Wcid instead of network-manager resolved the problem, but I hate it so I have reverted back
* Problem was not present in up-to-date Precise installation
* Problem persists after purging and reinstalling network-manager
* Problem persists with usage of Precise stable release up-to-date network-manager* packages
* Network-manager nor atl1c does not provide any debug details in syslog when problem occurs
* Also tried compile and use other versions of atl1c and also alx (which, as I found out, drops support of older hardware periodically, so did not help at all) from "linux backports" with no luck
* Mainline kernel did not help, bug still present with installed "linux-image-extras".
* To temporarily resolve issue, driver must be reloaded or connection reset by pulling out and reconnecting cable or whole system has to be restarted
* Problem occurs after complete re-installation of Raring, also in live session
* Wireless connection works flawlessly
* Problem is not hardware related (Windows OK, multiple hardware => same behaviour)

So much thanks and feel free to direct me for collecting necessary informations.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-6-generic 3.8.0-6.13
ProcVersionSignature: Ubuntu 3.8.0-6.13-generic 3.8.0-rc7
Uname: Linux 3.8.0-6-generic x86_64
ApportVersion: 2.8-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: budovi 3217 F.... pulseaudio
Date: Mon Feb 18 16:59:18 2013
HibernationDevice: RESUME=UUID=79baaf5e-05cb-4c1a-9b68-bec79ad0e3af
InstallationDate: Installed on 2013-02-12 (5 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64 (20130212)
MachineType: Micro-Star INT'L CO., LTD. MS-16Y1
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-6-generic root=UUID=6c01485e-7824-43c3-b40b-9c44d150f131 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-6-generic N/A
 linux-backports-modules-3.8.0-6-generic N/A
 linux-firmware 1.102
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/26/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: E16Y1IMS.300
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: MS-16Y1
dmi.board.vendor: Micro-Star INT'L CO., LTD.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: Micro-Star INT'L CO., LTD.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrE16Y1IMS.300:bd12/26/2011:svnMicro-StarINT'LCO.,LTD.:pnMS-16Y1:pvr1.0:rvnMicro-StarINT'LCO.,LTD.:rnMS-16Y1:rvr1.0:cvnMicro-StarINT'LCO.,LTD.:ct9:cvr1.0:
dmi.product.name: MS-16Y1
dmi.product.version: 1.0
dmi.sys.vendor: Micro-Star INT'L CO., LTD.

Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :
Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

Not sure which driver was used in Precise with my Ethernet card, maybe I should find out and try that one...

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: raring
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This sounds like an issue with network manager, so I added that package to this bug.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

Bug still present in new 3.8 mainline kernel with extras:
linux-image-3.8.0-030800-generic_3.8.0-030800.201302181935_amd64
linux-image-extra-3.8.0-030800-generic_3.8.0-030800.201302181935_amd64

Report from ifconfig:
~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr e0:69:95:41:f8:ab
          inet addr:<<removed>> Bcast:<<removed>> Mask:<<removed>>
          inet6 addr: <<removed>> Scope:Link
          inet6 addr: <<removed>> Scope:Global
          inet6 addr: <<removed>> Scope:Global
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:233985 errors:0 dropped:32064 overruns:32064 frame:32064
          TX packets:107283 errors:0 dropped:0 overruns:0 carrier:21
          collisions:0 txqueuelen:1000
          RX bytes:277764773 (277.7 MB) TX bytes:10123410 (10.1 MB)

I removed my public IP's but yes, they are loaded correctly. So many packets there because I re-attached cable every time bug occured (not so rare, maybe every minute approximately, so yes, really annoying)

Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

Ok I have done some more investigation and seems like I found it.
I went through syslog in range of 2 minutes before and after bug notice (which I controlled with ping). I did this for 4 times to reduce possibility of something missing.

Really do not know why I did not noticed this earlier, but I think there was too much "<info>" debug messages that I considered it as "normal" obviously:

There are never ending cycles about exchanging IPv6 addresses:
----------------------------------------------------------------------------------------------------------------------------
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 20
[nm-ip6-manager.c:885] process_address_change(): (eth0): address cache size: 6 -> 6:
[nm-ip6-manager.c:835] dump_address_change(): (eth0) new address: <IPv6 address 1>
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 20
[nm-ip6-manager.c:885] process_address_change(): (eth0): address cache size: 6 -> 6:
[nm-ip6-manager.c:835] dump_address_change(): (eth0) new address: <IPv6 address 2>
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 52

Between that, there are also this messages (in any pattern I could notice):
----------------------------------------------------------------------------------------------------------------------------
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 24
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 25

And finally, in time of bug occurs (+/- 5 seconds) there is this particular message:
----------------------------------------------------------------------------------------------------------------------------
[nm-netlink-monitor.c:167] link_msg_handler(): netlink link message: iface idx 3 flags 0x1003

This message just brought me there: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=657495
and it seems be related. So I read it and disabled IPv6 in sysctl.conf. All <"info"> debug messages are gone. Will report definitely later if it helped but seems so.

This also explains why I did not noticed nothing in Precise, because I remember I disabled IPv6 before this bug has been discovered. Cannot recall reason, but I'm pretty sure.

I haven't found this bug on launchpad, but I think description should be modified to reflect this investigation. Thank you for "psychically" force me to check this again, I was about to buy router as a workaround :P

Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

Internet starts to lag again and after restart, I re-enabled debugging and syslog was spammed with [nm-ip6-manager.c:xxx] messages (and also blocked out by some watchdog!) Tried disabling IPv6 by kernel parameter added into grub, but it is still spamming my syslog like this:
---------------------------------------
[nm-ip6-manager.c:1340] netlink_notification(): netlink event type 16
[nm-ip6-manager.c:1301] process_newlink(): ignoring netlink message family 0

And at some point, there comes my "great" and "meaningful" message again, and my internet is down!
---------------------------------------
[nm-netlink-monitor.c:167] link_msg_handler(): netlink link message: iface idx 3 flags 0x1003

So what changed? There is some pattern with debug messages, related to IPv6 even if it is disabled. Some symptoms are connected to known bug, which occurred year before. All other conditions and observations remains unchanged.

Now I'm lost with some random debug message leading nowhere. I already thought that this will lead me to "usable state", but just "no". I'm really exhausted and need some sleep. Any help appreciated.

Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

I can confirm that this bug affects live session booted from today's daily build, 20130221.1

I also get an error while booting: atl1c 0000:05:00.0 Unable to allocate MSI interrupt Error -6 but I have no clue it is related.

description: updated
Revision history for this message
Jakub Budiský (jakub-budisky-deactivatedaccount) wrote :

I have tried Ethernet connection at school in "clean infrastructure" cisco laboratory, so I definitely excluded some sort of inflicting conditions. Which is also weird because I'm obviously only person affected. Bug is still present with network-manager version 0.9.7.995+git201301311547.17123fc-0ubuntu1 and it is really critical and making me mad. This is definitely type of bug which needs to be resolved before stable release.

no longer affects: linux (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Revision history for this message
Dr Lektro (drlektro) wrote :

Same issue here ; conditions:

- 02:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)
 Subsystem: Lenovo Device 21f1
 Kernel driver in use: atl1c
- Networkmanager 0.9.8.0 enabled
- Linux 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
- cable plugged in

As soon I generate some traffic, my connection hangs and number of dropped packets on eth0 is increasing rapidly.

Syslog doesn't show any errors when the connection hangs / packet are dropped.

I'll enable debug logging for network manager and update my comment...

When I disable the netwokmanager, and assign ip/dns manually, everything is working fine => driver OK.

Steven

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.