AP55 are going offline (connected to SG 135W 9.509-3)

Hi,

 

since 4 weeks we have 2 AP55 (configured as mesh APs) that are going offline time to time.

What I ask myself is how to figure out what happend:

Can I figure out if the AP reboots because of power lost or is something different the cause?

How can I get the SW Version of the AP?

 

SG means: last seen 7/23 20:33h but wlan log means:

7/24 0:6h STA <MAC> authenticated and STA <MAC> associated

and AWELOGGER menas: "connected" 7/24 a couple of seconds later

So the AP is "a Little bit" visible?

 

How can I proceed?

 

Thanks für help

Gernot

  • Hi, 

    What do you see in the wireless.log that causes the disconnection? Make sure the AP is not stuck in a firmware upgrade.

    Thanks,

  • In reply to sachingurung:

    Hi, sorry for delayed answer:

    2018:08:01-07:29:23 A4004DE355BBF65 aweclient[1220]: facility_awed.c (163) Timeout expired, can't connect.
    2018:08:01-07:29:23 A4004DE355BBF65 aweclient[1220]: aweclient.c (277) Failed to connect to 192.22.22.1. <- physical Interface of SG 135
    2018:08:01-07:29:44 igafa-gh hostapd: wifi0: AP-STA-DISCONNECTED 14:10:9f:d6:cb:03
    2018:08:01-07:29:44 igafa-gh hostapd: wifi0: STA 14:10:9f:d6:cb:03 WPA: event 2 notification

    This is the log from that time.

    Nothing more.

  • In reply to GernotMeyer:

    I can also see this:

    2018:08:01-07:29:49 igafa-gh awed[4813]: [MASTER] new connection from 192.22.22.101:36828
    2018:08:01-07:29:49 igafa-gh awed[12570]: [A4004DE355BBF65] AP55 from 192.22.22.101:36828 identified as A4004DE355BBF65
    2018:08:01-07:29:49 igafa-gh awed[12570]: [A4004DE355BBF65] (Re-)loaded identity and/or configuration
    2018:08:01-07:29:50 A4004DE355BBF65 aweclient[1220]: aweclient.c (2462) Exit from main loop with no error. Get new configuration.
    2018:08:01-07:29:52 igafa-gh awed[31265]: [A4004DE355BBF65] ll_read: dead socket: Interrupted system call
    2018:08:01-07:29:52 igafa-gh awed[31265]: [A4004DE355BBF65] disconnected. Close socket and kill process.
    2018:08:01-07:30:52 igafa-gh awed[12570]: [A4004DE355BBF65] ll_read: dead socket: Resource temporarily unavailable
    2018:08:01-07:30:52 igafa-gh awed[12570]: [A4004DE355BBF65] disconnected. Close socket and kill process.


  • In reply to GernotMeyer:

    Hi GernotMeyer,

    I am checking the logs against some documents, I will DM you to proceed with the investigation. 

    Thanks

  • In reply to GernotMeyer:

    Hi,

    1. Take SSH to the UTM and restart awed services by executing the following command: /var/mdw/scripts/awed restart
    2. When the AP goes offline, what does the LeD status reflect, refer to the following document to understand the LeD light status for all the AP models. https://www.sophos.com/en-us/medialibrary/PDFs/documentation/sophoswirelessoien.pdf?la=en.pdf
    3. Change physical connection (cables) which connects AP to UTM and the power adapter.
    4. Make sure you use the provided power adapter or PoE adapter that comes along with the AP. If you are using a different adapter it could cause a socket error.
    5. Try a different frénquence band and change the channel selection to Manual and try to find a channel width with less congestion.
    6. Finally, delete the AP and re add it with a fresh AP configuration.

    After executing the changes, capture the awed.log and show us the results.

    Thanks,

  • In reply to sachingurung:

    Hi,

    I rebooted all physical devices. Now the system is up and running since 2 days (pray pray pray).

    Ugly: All AP have been blinking green rapidly (connected) but haven't been able to get connected (vefore restart).

    Because all APs (exept one) are mashed without any cable connected cable could not cause the problem.

    Is there any way to figure out: What Firmware version an AP has?

    Is it posible to get an info that a reconnect was done because of some power problems from the AP?

    I haven't been able to identify any cause or detailed info coming from AP.

  • In reply to GernotMeyer:

    ;-(

     

    AP goes offline again:

    2018:08:20-17:45:24 A4004D78D09C1F9 aweclient[1219]: aweclient.c (2050) No reply get from last message, close socket.
    2018:08:20-17:45:24 sophos-sg135 awed[2616]: [A4004D78D09C1F9] ll_read: short read or connection error:
    2018:08:20-17:45:24 sophos-sg135 awed[2616]: [A4004D78D09C1F9] disconnected. Close socket and kill process.
    2018:08:20-17:45:35 sophos-sg135 awed[4804]: [MASTER] new connection from 192.22.22.102:36608
    2018:08:20-17:45:45 sophos-sg135 awed[3768]: [Unknown] ll_read: dead socket: Resource temporarily unavailable
    2018:08:20-17:45:45 sophos-sg135 awed[3768]: [Unknown] disconnected. Close socket and kill process.
    2018:08:20-17:45:47 A4004D78D09C1F9 aweclient[1219]: aweclient.c (2456) Exit from main loop.
    2018:08:20-17:45:47 A4004D78D09C1F9 aweclient[1219]: aweclient.c (2470) Error in getting new configuration. Reconnecting (12)
    2018:08:20-17:45:53 sophos-sg135 awed[4804]: [MASTER] new connection from 192.22.22.102:36609
    2018:08:20-17:45:53 sophos-sg135 awed[3788]: [A4004D78D09C1F9] AP55 from 192.22.22.102:36609 identified as A4004D78D09C1F9
    2018:08:20-17:45:53 sophos-sg135 awed[3788]: [A4004D78D09C1F9] (Re-)loaded identity and/or configuration
    2018:08:20-17:46:52 A4004D78D09C1F9 awelogger[2121]: id="4105" severity="info" sys="System" sub="WiFi" name="STA WPA failure" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:ca:22:16" sta="bc:85:56:70:d2:f6" reason_code="2"
    2018:08:20-17:46:52 A4004D78D09C1F9 awelogger[2121]: id="4102" severity="info" sys="System" sub="WiFi" name="STA disconnected" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:ca:22:16" sta="bc:85:56:70:d2:f6"
    2018:08:20-17:46:52 A4004D78D09C1F9 hostapd: wlan1: STA bc:85:56:70:d2:f6 IEEE 802.11: authenticated
    2018:08:20-17:46:52 A4004D78D09C1F9 hostapd: wlan1: STA bc:85:56:70:d2:f6 IEEE 802.11: associated (aid 4)
    2018:08:20-17:46:52 A4004D78D09C1F9 awelogger[2121]: id="4103" severity="info" sys="System" sub="WiFi" name="STA authentication" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:ca:22:16" sta="bc:85:56:70:d2:f6" status_code="0"
    2018:08:20-17:46:52 A4004D78D09C1F9 awelogger[2121]: id="4104" severity="info" sys="System" sub="WiFi" name="STA association" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:ca:22:16" sta="bc:85:56:70:d2:f6" status_code="0"
    2018:08:20-17:46:52 A4004D78D09C1F9 awelogger[2121]: id="4101" severity="info" sys="System" sub="WiFi" name="STA connected" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:ca:22:16" sta="bc:85:56:70:d2:f6"
    2018:08:20-17:46:52 A4004D78D09C1F9 hostapd: wlan1: STA bc:85:56:70:d2:f6 RADIUS: starting accounting session 5A578F4D-00000031
    2018:08:20-17:46:52 A4004D78D09C1F9 hostapd: wlan1: STA bc:85:56:70:d2:f6 WPA: pairwise key handshake completed (RSN)
    2018:08:20-17:46:54 sophos-sg135 awed[3788]: [A4004D78D09C1F9] ll_read: dead socket: Resource temporarily unavailable
    2018:08:20-17:46:54 sophos-sg135 awed[3788]: [A4004D78D09C1F9] disconnected. Close socket and kill process.
    2018:08:20-17:47:03 A4004D19EF2164A awelogger[2738]: id="4102" severity="info" sys="System" sub="WiFi" name="STA disconnected" ssid="WIFI-SSID" ssid_id="WLAN0.1" bssid="00:1a:8c:c8:ed:1e" sta="dc:68:eb:bc:34:6f"
    2018:08:20-17:47:03 A4004D19EF2164A hostapd: wlan8: STA dc:68:eb:bc:34:6f IEEE 802.11: disassociated
    2018:08:20-17:47:04 A4004D19EF2164A hostapd: wlan8: STA dc:68:eb:bc:34:6f IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
    2018:08:20-17:47:14 A4004D19EF2164A kernel: [412579.520000] br-lan: topology change detected, sending tcn bpdu
    2018:08:20-17:47:14 A4004D19EF2164A kernel: [412579.520000] br-lan: port 1(eth0) entered forwarding state
    2018:08:20-17:48:00 A4004D78D09C1F9 hostapd: wlan8: STA a0:86:c6:09:56:cb WPA: group key handshake completed (RSN)
    2018:08:20-17:48:00 A4004D78D09C1F9 hostapd: wlan8: STA 38:37:8b:63:65:55 WPA: group key handshake completed (RSN)
    2018:08:20-17:48:00 A4004D78D09C1F9 hostapd: wlan8: STA 44:d4:e0:8f:20:35 WPA: group key handshake completed (RSN)
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: AP-STA-DISCONNECTED a8:96:75:8e:26:55
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 WPA: event 2 notification
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 WPA: event 2 notification
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.1X: unauthorizing port
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.1X: unauthorizing port
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.11: disassociated
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.11: disassociated
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DISASSOCIATE.indication(a8:96:75:8e:26:55, 1)
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DISASSOCIATE.indication(a8:96:75:8e:26:55, 1)
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DELETEKEYS.request(a8:96:75:8e:26:55)
    2018:08:20-17:50:43 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DELETEKEYS.request(a8:96:75:8e:26:55)
    2018:08:20-17:50:43 A4004D19EF2164A hostapd: wlan0: STA a8:96:75:8e:26:55 IEEE 802.11: authenticated
    2018:08:20-17:50:43 A4004D19EF2164A hostapd: wlan0: STA a8:96:75:8e:26:55 IEEE 802.11: associated (aid 2)
    2018:08:20-17:50:43 A4004D19EF2164A awelogger[2809]: id="4103" severity="info" sys="System" sub="WiFi" name="STA authentication" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:c8:ed:16" sta="a8:96:75:8e:26:55" status_code="0"
    2018:08:20-17:50:43 A4004D19EF2164A awelogger[2809]: id="4104" severity="info" sys="System" sub="WiFi" name="STA association" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:c8:ed:16" sta="a8:96:75:8e:26:55" status_code="0"
    2018:08:20-17:50:43 A4004D19EF2164A hostapd: wlan0: STA a8:96:75:8e:26:55 RADIUS: starting accounting session 5A578F7D-00000082
    2018:08:20-17:50:43 A4004D19EF2164A hostapd: wlan0: STA a8:96:75:8e:26:55 WPA: pairwise key handshake completed (RSN)
    2018:08:20-17:50:43 A4004D19EF2164A awelogger[2809]: id="4101" severity="info" sys="System" sub="WiFi" name="STA connected" ssid="WIFI-SSID" ssid_id="WLAN0.0" bssid="00:1a:8c:c8:ed:16" sta="a8:96:75:8e:26:55"
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DEAUTHENTICATE.indication(a8:96:75:8e:26:55, 2)
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DEAUTHENTICATE.indication(a8:96:75:8e:26:55, 2)
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DELETEKEYS.request(a8:96:75:8e:26:55)
    2018:08:20-17:50:44 sophos-sg135 hostapd: wifi0: STA a8:96:75:8e:26:55 MLME: MLME-DELETEKEYS.request(a8:96:75:8e:26:55)

    reboot of the Sophos router doesn't help....

     

    Last seen 17:45

  • In reply to GernotMeyer:

    It looks like the AP does something like an reboot or looses comply the connection to the SG/XG. 

    As far as i can see, there is a downtime of 10 secs. 

     

    2018:08:20-17:45:24 A4004D78D09C1F9 aweclient[1219]: aweclient.c (2050) No reply get from last message, close socket.
    2018:08:20-17:45:24 sophos-sg135 awed[2616]: [A4004D78D09C1F9] ll_read: short read or connection error: 
    2018:08:20-17:45:24 sophos-sg135 awed[2616]: [A4004D78D09C1F9] disconnected. Close socket and kill process.
    2018:08:20-17:45:35 sophos-sg135 awed[4804]: [MASTER] new connection from 192.22.22.102:36608

     

    Can you describe your network? How does the AP find your SG? Which firmware are you using? VLANs or some routing devices there? 

     

    You could also give central wireless a shot. 

    30 days trial are for free and you could see, if the new firmware in central wireless can solve this issue. 

     

  • In reply to LuCar Toni:

    Network is mesh:

    SG135W wired to an AP55 which is mesh root. 5 APs are connected via WLAN because of no available LAN infrastructure.

    SG135W and root mesh AP55 are connected directly (no Switch!).

    SG Firmware is 9.510.5 (actual).

    There is never any hint that the root AP looses the connection. "Only" mesh APs connected via WLAN.

     

    As I asked before: How can I get info about AP firmware version and information if AP itself hung or just reboots?

    Am I able to read this somewhere (except connecting console cable to an AP)?

  • In reply to GernotMeyer:

    Hi,

    Did you make any changes to the frequency band to test if that helps? Alongside, AP firmware is available in the /etc/wireless/firmware directory. CAT the .devinfo file and you will get the details.

    Thanks,

  • In reply to sachingurung:

    Hi,

    need to come back to this "case".

    APs are still going offline:

    Is there any possibilty to figuer out who causes the problems? The APs or the XG?

    One time it is one AP, one time another and sometimes all 3 of them.

    The wired AP is allways online.

    I changed the Frequencies but no real change.

    Because it is a house on a big groud there are even 1-2 other WLANs available.

    Can I figure out if an AP is going to reboot (power lost)?

  • In reply to GernotMeyer:

    Also strong:

    Get couple of thoses entries in SG Wireless Log: ath: phy2: Failed to stop TX DMA, queues=0x002!

    And also:
    2019:05:29-13:06:51 RootAP add_radio_mapping.sh: AP55 detected: Using on-the-fly ACS procedure
    2019:05:29-13:06:51 RootAP add_radio_mapping.sh: AP55 detected: Using on-the-fly ACS procedure
    2019:05:29-13:06:52 RootAP astaro-functions.sh: Running ACS-survey on 2.4GHz...
    2019:05:29-13:06:52 RootAP add_radio_mapping.sh: Channel 1 busy time 57/264ms, 21%
    2019:05:29-13:06:52 RootAP add_radio_mapping.sh: Channel 2 busy time 16/218ms, 7%

    look at the datetime....

    All other log entries are OK (representing the correct time).

  • In reply to GernotMeyer:

    Can you PM me the complete log file? I want to check what channel did the APs select and alongside, reading your previous reply, that a physically connected AP is always online, I suspect that the issue is related to channel congestion.

    You can choose a specific channel for an AP instead of using "Auto" option. This Channel selection is done at start-up and hourly using background scans.

    As multiple APs are being used, they should be configured to use different channels if their signals will overlap; and these channels should be as far apart as possible. For example, you are using a 2.5 GHz band so you have channel 1, 6 and 11 which will never overlap. So you can configure the APs in the following manner,

    AP1: channel 1

    AP2: channel 6

    AP3: channel 11

    Alongside, it is also important to ensure that access points are not scanning for the best channel, at the same time, as it is likely that they will all end up using the same channel. This could happen if the UTM boots all of the access points at the same time, or if you have configured the access points to all scan at the same time using a time definition.

    Configure different time based definitions and  assign them to the SSiDs in the Advanced option for a Wireless Network definition.

    Please monitor and let me know the results after the configurations changes.

    Thanks,

  • In reply to sachingurung:

    Hi,

    I changed channels and forced direct ones (one Channel per AP).

    Quitness since 24h. No interruption. Good news?!

    I'll keep you informed.

  • In reply to GernotMeyer:

    Thanks for the update, please keep us informed. This could help others with similar issue. 

    Thanks,