Quantcast

madwifi makes board crash after bursting packets

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

madwifi makes board crash after bursting packets

Dennis Borgmann
Hello madwifi-lists!
Hello hostapd-list!

Since I do not know, which of the madwifi-lists is still active and
read, I post my question to both madwifi-users and madwifi-dev.

See my setup below. If I burst data from one point to the other (lets
say with 'scp' a file of about 2GB size) at full speed (with the
distance I got right now (1m) about 23Mbit/s) the AP crashes after quite
some time (~1 hour). I am not able to check what is going on on the
machine that runs the AP, because the attached serial console won't
respond anymore and I cannot establish any connection to the AP anymore.
It is still transmitting beacons and can be recognized by any wireless
station (with the one being the client with 'iwlist ath0 scan' and with
a laptop not involved in the wireless connection on my desk with 'iwlist
wlan0 scan'). So I fear, I cannot give that much debug output, since I
can't access the AP anymore.

Rebooting the AP solves the problem immediately and the client connects
to the AP directly making data transfer possible again.

I could not reproduce this problem with low traffic, since I did a 'ping
-f IP' last night and it was still up this morning after 14 hours.

Running an iperf yesterday got the client crashed. Same problem - I
couldn't connect to it anymore via the serial console. Since it is not
transmitting beacons while being a station, I cannot tell, if the
wireless device was still up in any way.

My setup:

madwifi 0.9.4 - running on AP and Client
hostapd v0.6.9 on AP side with RSN-encryption
wpa_supplicant v0.6.9 on client side with RSN-encryption
2 PC Engines ALIX3D2-Boards (x86, 500MHz, headless systems)
Alfa networks MiniPCI-WAN cards with Atheros AR5414 chipset

I am using madwifi-0.9.4 on a productive system and I am not able to
switch to a newer driver within a recent kernel in a rush. Therefore I'd
like to stick with it, if that's possible.

Does this problem sound familiar to anyone? Are there any hints, what I
still might try before I am urged to move to a recent driver?

Thanks for any hint.

Kind regards,
Dennis

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Madwifi-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/madwifi-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: madwifi makes board crash after bursting packets

Eric Malkowski
Hi Dennis-

I run almost the same thing you do.
Alix 3D2 boards with Ubiquiti XR5 (atheros 5414) in bottom slot and
Wistron CM-9 in top slot on some setups and an Ubiquiti SR2 on others.  
They all have the XR5 for the 5.8 ghz side of things.

I run the XR5's in ahdemo mode w/ OLSR to do a mesh setup.
I'm using MADWIFI 0.9.4.1 (from the 0.9.4 branch from not too long ago)
with kernel 2.6.26.3
I'm able to get similar throughput -- possibly even a little more (maybe
28 megabits ... but can't remember my "peak" number).

I've also run the XR5 as an AP with an Ubiquiti SR5 as client on same
MADWIFI 0.9.4.1 on kernel 2.6.28.4 on a setup at my day job.
The XR5 AP never crashes or anything -- we have a constant stream of UDP
traffic every 10 ms in both directions and only miss 1 packet a handful
of times per hour and it runs for days.
We can also stream data over a TCP connection in parallel with the UDP
stuff but nowhere near 23Mbit/s and it works fine.

We have a problem with the SR5 (Ubiquiti's earlier 400 mW radio that
predates the XR5) that uses the older Atheros 5213
About once per month, it will have it's Tx queue get hung up.  The board
can't transmit but can still receive just fine.
The Tx function in the driver re-schedules a softirq and gets in some
type of a loop and you see a Ksoftirqd taking up tons of CPU making
everything very sluggish.
It looks like this problem mentioned in the mailing list from some time ago:

http://article.gmane.org/gmane.linux.drivers.madwifi.user/15084

You may want to try and get on the serial console of your Alix when the
problem is happening and see if you can manage to get CPU time to get
logged in and run "top" or something and see if Ksoftirqd is killing the
CPU.  You'll also see the Tx packets increasing on the athX interface,
but not on the corresponding wifiX interface.

This may totally not be your problem, but may be worth checking.

My only other suggestion would be to try an XR5 in place of your alfa
board and see if the problem goes away.
This is based on the fact that I really have zero issues with my XR5
whether ahdemo mode or AP mode.  Maybe I haven't pushed them enough like
you.

As one more side point -- I have an older Soekris net4826 I use as an AP
in my house that runs the same software with CM-9 and it uses hostapd
with AES256 encryption etc. (probably not quite as new) and that setup
runs w/ no issues for the general use AP in my house.  So I know the
software combination should be ok.

Good Luck

-Eric Malkowski

Dennis Borgmann wrote:

> Hello madwifi-lists!
> Hello hostapd-list!
>
> Since I do not know, which of the madwifi-lists is still active and
> read, I post my question to both madwifi-users and madwifi-dev.
>
> See my setup below. If I burst data from one point to the other (lets
> say with 'scp' a file of about 2GB size) at full speed (with the
> distance I got right now (1m) about 23Mbit/s) the AP crashes after quite
> some time (~1 hour). I am not able to check what is going on on the
> machine that runs the AP, because the attached serial console won't
> respond anymore and I cannot establish any connection to the AP anymore.
> It is still transmitting beacons and can be recognized by any wireless
> station (with the one being the client with 'iwlist ath0 scan' and with
> a laptop not involved in the wireless connection on my desk with 'iwlist
> wlan0 scan'). So I fear, I cannot give that much debug output, since I
> can't access the AP anymore.
>
> Rebooting the AP solves the problem immediately and the client connects
> to the AP directly making data transfer possible again.
>
> I could not reproduce this problem with low traffic, since I did a 'ping
> -f IP' last night and it was still up this morning after 14 hours.
>
> Running an iperf yesterday got the client crashed. Same problem - I
> couldn't connect to it anymore via the serial console. Since it is not
> transmitting beacons while being a station, I cannot tell, if the
> wireless device was still up in any way.
>
> My setup:
>
> madwifi 0.9.4 - running on AP and Client
> hostapd v0.6.9 on AP side with RSN-encryption
> wpa_supplicant v0.6.9 on client side with RSN-encryption
> 2 PC Engines ALIX3D2-Boards (x86, 500MHz, headless systems)
> Alfa networks MiniPCI-WAN cards with Atheros AR5414 chipset
>
> I am using madwifi-0.9.4 on a productive system and I am not able to
> switch to a newer driver within a recent kernel in a rush. Therefore I'd
> like to stick with it, if that's possible.
>
> Does this problem sound familiar to anyone? Are there any hints, what I
> still might try before I am urged to move to a recent driver?
>
> Thanks for any hint.
>
> Kind regards,
> Dennis
>
> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today
> http://p.sf.net/sfu/msIE9-sfdev2dev
> _______________________________________________
> Madwifi-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/madwifi-users
>  


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Madwifi-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/madwifi-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: madwifi makes board crash after bursting packets

Dennis Borgmann
Eric,
madwifi-lists,

I have been able to avoid the AP crashing by using another version of
madwifi (I tried using voyage-0.6.5 which contains madwifi in version
madwifi-trunk 0.9.4+r4099 as mentioned in the link posted by Eric).

It really seemed to be a problem which overloaded the CPU. I come up
with this idea, because I have a small display attached via serial line
to the ALIX-AP-board and the display was really dull once the error
occurred, but sometimes it still reacted - S-L-O-W-L-Y.

So now, the error is gone, no more crashing - at least during the last 5
hours, but since the problem usually occurred within 20 minutes, I am
very confident. Using a newer version of madwifi solved it.

BUT now, I see something else. I am still loading the link heavily with
the scp I mentioned before. Roundabout 6 minutes after booting the two
ALIX-boards, the connection becomes very laggy. I am pinging the
machine, that I am doing the scp to with the very same machine, that
does the scp and the ping-times go up to about 5000ms, while usually the
times should be around 0.5ms. This status stays until I cancel the scp
and wait for some time (let's say 2 minutes). After that, I restart the
scp and it goes very well again - for about 30 seconds. Then, the
throughput falls near to 0, the ping times go up to 5000ms again and so on.

I tested other WLAN-cards (not the one Eric mentioned, because I don't
have it here, but several others) and the problem doesn't seem to be
solved with another card.

This is definitely not related to encryption, therefore I deleted
hostapd-list. The problem occurs without encryption as well as with
encryption.

Now my suggestions. No idea, if anything is correct!
My thought is, that some buffer seems to be filled. After some time, the
packets in this buffer seem to be deleted due to some timeout and they
won't block other packets from being transmitted. The longer I wait, the
more packets become deleted and the longer the buffer can be filled
again. Nevertheless it must be some quite huge buffer, if I think of the
6 minutes of good throughput in the beginning right after booting the
ALIX-boards.

Anyway, the packets arriving at the RJ45-port of the ALIX are sent via a
100M port and the WLAN can only take up to about 23M. If I have a look
at any simple AP available in a cheap shop, this shouldn't be a problem,
because you can do such things with any AP without problems. IP should
regulate this on its own. Both with UDP (I did some iperf) and TCP (this
is the scp) the problem comes up.

Another nice thing to know might be the output I get from the ping I do
besides after the scp has broken down and I stopped the scp.

(...)
64 bytes from 192.168.0.10: icmp_seq=3257 ttl=64 time=4608 ms
64 bytes from 192.168.0.10: icmp_seq=3258 ttl=64 time=4104 ms
64 bytes from 192.168.0.10: icmp_seq=3259 ttl=64 time=3595 ms
64 bytes from 192.168.0.10: icmp_seq=3260 ttl=64 time=3088 ms
64 bytes from 192.168.0.10: icmp_seq=3261 ttl=64 time=2581 ms
64 bytes from 192.168.0.10: icmp_seq=3262 ttl=64 time=2073 ms
64 bytes from 192.168.0.10: icmp_seq=3263 ttl=64 time=1566 ms
64 bytes from 192.168.0.10: icmp_seq=3264 ttl=64 time=1059 ms
64 bytes from 192.168.0.10: icmp_seq=3265 ttl=64 time=551 ms
64 bytes from 192.168.0.10: icmp_seq=3266 ttl=64 time=44.4 ms
64 bytes from 192.168.0.10: icmp_seq=3267 ttl=64 time=2.59 ms
64 bytes from 192.168.0.10: icmp_seq=3268 ttl=64 time=1.75 ms
64 bytes from 192.168.0.10: icmp_seq=3269 ttl=64 time=2.08 ms
64 bytes from 192.168.0.10: icmp_seq=3270 ttl=64 time=2.06 ms
64 bytes from 192.168.0.10: icmp_seq=3271 ttl=64 time=2.05 ms
64 bytes from 192.168.0.10: icmp_seq=3272 ttl=64 time=2.33 ms
64 bytes from 192.168.0.10: icmp_seq=3273 ttl=64 time=1.98 ms
64 bytes from 192.168.0.10: icmp_seq=3274 ttl=64 time=1.66 ms
64 bytes from 192.168.0.10: icmp_seq=3275 ttl=64 time=279 ms
64 bytes from 192.168.0.10: icmp_seq=3276 ttl=64 time=230 ms
64 bytes from 192.168.0.10: icmp_seq=3277 ttl=64 time=242 ms
64 bytes from 192.168.0.10: icmp_seq=3278 ttl=64 time=459 ms
64 bytes from 192.168.0.10: icmp_seq=3279 ttl=64 time=2109 ms
64 bytes from 192.168.0.10: icmp_seq=3280 ttl=64 time=3036 ms
64 bytes from 192.168.0.10: icmp_seq=3281 ttl=64 time=3245 ms
(...)

As you can see, the times are picking back up to later fall back down to
bad ping times. I have no idea, what this might mean...

The CPU load is low all of the time: 98% idle. Performing 'ifconfig ath0
down && sleep 2 && ifconfig ath0 up' does not affect the link quality,
on client side as well as on AP side.

Time to go to bed, I don't have anymore ideas.

Kind regards,
Dennis

Eric Malkowski schrieb:

> Hi Dennis-
>
> I run almost the same thing you do.
> Alix 3D2 boards with Ubiquiti XR5 (atheros 5414) in bottom slot and
> Wistron CM-9 in top slot on some setups and an Ubiquiti SR2 on
> others.  They all have the XR5 for the 5.8 ghz side of things.
>
> I run the XR5's in ahdemo mode w/ OLSR to do a mesh setup.
> I'm using MADWIFI 0.9.4.1 (from the 0.9.4 branch from not too long
> ago) with kernel 2.6.26.3
> I'm able to get similar throughput -- possibly even a little more
> (maybe 28 megabits ... but can't remember my "peak" number).
>
> I've also run the XR5 as an AP with an Ubiquiti SR5 as client on same
> MADWIFI 0.9.4.1 on kernel 2.6.28.4 on a setup at my day job.
> The XR5 AP never crashes or anything -- we have a constant stream of
> UDP traffic every 10 ms in both directions and only miss 1 packet a
> handful of times per hour and it runs for days.
> We can also stream data over a TCP connection in parallel with the UDP
> stuff but nowhere near 23Mbit/s and it works fine.
>
> We have a problem with the SR5 (Ubiquiti's earlier 400 mW radio that
> predates the XR5) that uses the older Atheros 5213
> About once per month, it will have it's Tx queue get hung up.  The
> board can't transmit but can still receive just fine.
> The Tx function in the driver re-schedules a softirq and gets in some
> type of a loop and you see a Ksoftirqd taking up tons of CPU making
> everything very sluggish.
> It looks like this problem mentioned in the mailing list from some
> time ago:
>
> http://article.gmane.org/gmane.linux.drivers.madwifi.user/15084
>
> You may want to try and get on the serial console of your Alix when
> the problem is happening and see if you can manage to get CPU time to
> get logged in and run "top" or something and see if Ksoftirqd is
> killing the CPU.  You'll also see the Tx packets increasing on the
> athX interface, but not on the corresponding wifiX interface.
>
> This may totally not be your problem, but may be worth checking.
>
> My only other suggestion would be to try an XR5 in place of your alfa
> board and see if the problem goes away.
> This is based on the fact that I really have zero issues with my XR5
> whether ahdemo mode or AP mode.  Maybe I haven't pushed them enough
> like you.
>
> As one more side point -- I have an older Soekris net4826 I use as an
> AP in my house that runs the same software with CM-9 and it uses
> hostapd with AES256 encryption etc. (probably not quite as new) and
> that setup runs w/ no issues for the general use AP in my house.  So I
> know the software combination should be ok.
>
> Good Luck
>
> -Eric Malkowski
>
> Dennis Borgmann wrote:
>> Hello madwifi-lists!
>> Hello hostapd-list!
>>
>> Since I do not know, which of the madwifi-lists is still active and
>> read, I post my question to both madwifi-users and madwifi-dev.
>>
>> See my setup below. If I burst data from one point to the other (lets
>> say with 'scp' a file of about 2GB size) at full speed (with the
>> distance I got right now (1m) about 23Mbit/s) the AP crashes after quite
>> some time (~1 hour). I am not able to check what is going on on the
>> machine that runs the AP, because the attached serial console won't
>> respond anymore and I cannot establish any connection to the AP anymore.
>> It is still transmitting beacons and can be recognized by any wireless
>> station (with the one being the client with 'iwlist ath0 scan' and with
>> a laptop not involved in the wireless connection on my desk with 'iwlist
>> wlan0 scan'). So I fear, I cannot give that much debug output, since I
>> can't access the AP anymore.
>>
>> Rebooting the AP solves the problem immediately and the client connects
>> to the AP directly making data transfer possible again.
>>
>> I could not reproduce this problem with low traffic, since I did a 'ping
>> -f IP' last night and it was still up this morning after 14 hours.
>>
>> Running an iperf yesterday got the client crashed. Same problem - I
>> couldn't connect to it anymore via the serial console. Since it is not
>> transmitting beacons while being a station, I cannot tell, if the
>> wireless device was still up in any way.
>>
>> My setup:
>>
>> madwifi 0.9.4 - running on AP and Client
>> hostapd v0.6.9 on AP side with RSN-encryption
>> wpa_supplicant v0.6.9 on client side with RSN-encryption
>> 2 PC Engines ALIX3D2-Boards (x86, 500MHz, headless systems)
>> Alfa networks MiniPCI-WAN cards with Atheros AR5414 chipset
>>
>> I am using madwifi-0.9.4 on a productive system and I am not able to
>> switch to a newer driver within a recent kernel in a rush. Therefore I'd
>> like to stick with it, if that's possible.
>>
>> Does this problem sound familiar to anyone? Are there any hints, what I
>> still might try before I am urged to move to a recent driver?
>>
>> Thanks for any hint.
>>
>> Kind regards,
>> Dennis
>>
>> ------------------------------------------------------------------------------
>>
>> Beautiful is writing same markup. Internet Explorer 9 supports
>> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
>> Spend less time writing and  rewriting code and more time creating great
>> experiences on the web. Be a part of the beta today
>> http://p.sf.net/sfu/msIE9-sfdev2dev
>> _______________________________________________
>> Madwifi-users mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/madwifi-users
>>  
>


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Madwifi-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/madwifi-users
Loading...