network manager 0.6 should use 9 or 8 seconds as a scan timeout to workaround driver bugs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
network-manager (Ubuntu) |
Fix Released
|
High
|
Alexander Sack | ||
Hardy |
Fix Released
|
High
|
Alexander Sack |
Bug Description
Binary package hint: network-manager
Reporting on behalf of Dan Langlois <dan clockwerk net>
(see http://
I have recently installed Ubuntu 8.04, Hardy Heron, on my laptop. This is
my first exposure to NetworkManager. For the purposes of this discussion,
I always connect to the same WPA wifi AP, named "darknet". At any given
moment 2 or 3 other WEP APs, and 1 or 2 unprotected APs are visible, but
I don't connect to them. My wifi hardware is an Intel 4965agn using the
iwl4965 driver.
Occasionally I'd boot up to find NM disconnected. When this happens, APs
are visible, but not darknet in particular. If I wait long enough (over
20min sometimes), eventually NM finds it and connects. This is way too
long to wait for an AP that I know is up and running and has a strong
signal where I'm sitting.
I started to feel that NM needed a manual "rescan" feature. Googling this
I found discussions from 2006 about this, including comments from
Dan Williams about how just touching the nm-applet should schedule a rescan.
(http://
So why was it not finding darknet?
I decided to get the source and experiment (given that I use HH,
I had to get the v0.6.6 source). What I found was that while in the
disconnected state, NM was indeed still scanning, but it just couldn't seem
to find darknet. So I dug a little deeper into the scanning guts of NM.
Already you might be thinking I should use a more recent version of NM.
However, I'm not sure that would work with Ubuntu HH. Since HH is the
Long Term Support version of Ubuntu, I thought I'd look into fixing it
up. This is why I'm investigating the older source and am keen to
improve it.
I'll skip along here as my email is getting long, my hypothesis is this:
NM waits too long from the time it initiates the scan. I believe the
driver (or more precisely, the Wireless Extensions (WE) code) is expiring
the scan result before NM gets them.
I propose that NM should read and process scan results sooner. The
current time of 10 seconds is right on the threshold of results
getting expired (as I understand it).
I'm no NM or WE expert, so I'll leave Dan Williams and other experts
to mull over my findings. Here are the details.
I added some nm_info()'s to NM source to see what was happening:
Jan 22 14:43:45 talon NetworkManager: <info> dan, wlan0 iw scan issued
Jan 22 14:43:47 talon NetworkManager: <info> dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:43:55 talon NetworkManager: <info> dan, schedule_
Jan 22 14:43:55 talon NetworkManager: <info> dan, process_
Jan 22 14:43:55 talon NetworkManager: <info> dan, found AP 'BELL669' ret=0
Jan 22 14:44:15 talon NetworkManager: <info> dan, wlan0 iw scan issued
Jan 22 14:44:17 talon NetworkManager: <info> dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:44:25 talon NetworkManager: <info> dan, schedule_
Jan 22 14:46:25 talon NetworkManager: <info> dan, wlan0 iw scan issued
Jan 22 14:46:27 talon NetworkManager: <info> dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:46:35 talon NetworkManager: <info> dan, schedule_
Jan 22 14:48:35 talon NetworkManager: <info> dan, wlan0 iw scan issued
Jan 22 14:48:37 talon NetworkManager: <info> dan, helper buf:8 SIOCGIWSCAN
Jan 22 14:48:45 talon NetworkManager: <info> dan, schedule_
Jan 22 14:48:45 talon NetworkManager: <info> dan, process_
Jan 22 14:48:45 talon NetworkManager: <info> dan, found AP 'BELL479' ret=1
Jan 22 14:48:45 talon NetworkManager: <info> dan, found AP 'BELL669' ret=0
So scans are triggered via iw_set_ext() and we setup to collect the results
in 10 seconds with schedule_
2 seconds later, wireless_
indicating results are ready. But the code there doesn't respond
right away, clamining that this might be the first of a burst of
such messages. So instead we continue to wait the full 10 seconds.
At the 10 second mark, results are fetched using iw_get_ext() and the
next scan is scheduled. Results (if any) are then processed.
In the log above you can see that some scans had no results, others
had just 1 or two. None had darknet.
Curious about how iw_set_ext() and iw_get_ext() worked, I dug a
little and found this interesting bit in net/mac80211/
function ieee80211_
if (time_after(
return current_ev;
Note that IEEE80211_
It seems that ieee80211_
into the scan result list if it was last seen over 10 seconds ago.
I changed the code in wireless_
schedule_
and I've been running for a few days now without ever being stuck
in the NM disconnected state. I've also found this connects much
faster (reasonable since I dropped a 6 second wait time) and
surprisingly I found this also improved waking from suspend, which
has been reliable since I made this change.
I'd be interested to hear what Dan Williams and others
think of this analysis. Is NM waiting too long to collect scan
results?
cheers,
Dan
tags: |
added: verification-done removed: verification-needed |
importance high as it prevents to see scan results for some.