XG managed APX 740 APs randomly going offline and dropping all clients

Feature and severity: I have a bug (appears to be) with SFOS v18 v3 and APX 740 wireless access points that I consider moderately impacting.

Summary: I am unsure of the trigger, however, every now and then (appears random but multiple times per day) all 3 of my APX 740s “appear” to go offline then come back a minute or two later.

Observed behavior: All 3 APs drop all clients and the SSID isn’t broadcast then after two or three minutes they come back.  All the clients need to re-home to the best AP again.  I run 3 740s using an XG210 as the controller and have 3 SSIDs (one for only 2.4ghz, one for only 5ghz and one for a guest SSID that’s 2.4 + 5ghz).  I thought it may be related to auto channel selection so I manually set the channel on both 5ghz and 2.4ghz radios on all APs (different channels of course).  The problem persisted though.  I don’t recall having this issue previously but it may have been happening without me being aware.  I say that because I’ve recently added significant home automation devices so now it’s very noticeable when this happens.

i tried Sophos central wireless and it’s worse.  I won’t go back to central until several releases come out.

Reproduce it:  This happens on its own many times a day but nothing that forces it that I’m aware of.

Supporting logs: The log viewer, under “SYSTEM” shows (just a brief excerpt for brevity):

SYSTEM
2020-01-05 14:13:20
WirelessProtection
   
[MASTER] sending notification about offline AP P210018WVRKDY75
18006
SYSTEM
2020-01-05 13:00:20
WirelessProtection
   
Successfully sent config to AP [P210018V7XJJH9F].
18007
SYSTEM
2020-01-05 13:00:05
WirelessProtection
   
Successfully sent config to AP [P210018WVRKDY75].
18007
SYSTEM
2020-01-05 12:59:48
WirelessProtection
   
[MASTER] sending notification about offline AP P210018V7XJJH9F
18006
SYSTEM
2020-01-05 12:59:27
WirelessProtection
   
[MASTER] sending notification about offline AP P210018WVRKDY75
18006
SYSTEM
2020-01-05 12:57:29
WirelessProtection
   
Successfully sent config to AP [P210018V7XJJH9F].
18007
SYSTEM
2020-01-05 12:56:49
WirelessProtection
   
[MASTER] sending notification about offline AP P210018V7XJJH9F
18006
SYSTEM
2020-01-05 12:56:05
WirelessProtection
   
Successfully sent config to AP [P210018WVRKDY75].
18007
SYSTEM
2020-01-05 12:55:25
WirelessProtection
   
[MASTER] sending notification about offline AP P210018WVRKDY75
18006

 

  • Again:

    SYSTEM
    2020-01-10 16:39:38
    Wireless Protection
       
    Successfully sent config to AP [P210018WVRKDY75].
    18007
    SYSTEM
    2020-01-10 16:39:05
    Wireless Protection
       
    [MASTER] sending notification about offline AP P210018WVRKDY75
    18006
    SYSTEM
    2020-01-08 12:31:41
    Wireless Protection
       
    Successfully sent config to AP [P210018V7XJJH9F].
    18007
    SYSTEM
    2020-01-08 12:31:10
    Wireless Protection
       
    [MASTER] sending notification about offline AP P210018V7XJJH9F
    18006
    SYSTEM
    2020-01-08 11:59:51
    Wireless Protection
       
    Successfully sent config to AP [P210018V7XJJH9F].
    18007
  • Memory use up to 53% now as well.  Maybe unrelated but worth mentioning with the unexpected reboot 3 days ago.

  • Could you show us a little network diagram with your APs? 

    Are you using VLANs or are the APs directly attached? 

    __________________________________________________________________________________________________________________

  • Happened again:

    SYSTEM
    2020-01-12 04:05:18
    Wireless Protection
       
    Successfully sent config to AP [P210018WVRKDY75].
    18007
    SYSTEM
    2020-01-12 04:04:44
    Wireless Protection
       
    [MASTER] sending notification about offline AP P210018WVRKDY75
    18006
    SYSTEM
    2020-01-11 08:29:30
    Wireless Protection
       
    Successfully sent config to AP [P210018V7XJJH9F].
    18007
    SYSTEM
    2020-01-11 08:26:51
    Wireless Protection
       
    Successfully sent config to AP [P210018V7XJJH9F].
    18007
  • Yes I can.  It’s a small test network with roughly 50 devices, most wireless.  I’m not using VLANs although I have a test VLAN configured on all 48 switch ports (tagged) and a VLAN interface on the firewall but it’s not used (it is configured on the only LAN interface I have however).  The DHCP network for that VLAN is turned off as well.  Much of this was in place to test central wireless.  I’ll put a straw diagram together today and post.

  • didn’t get a chance to toss a diagram together. I will tomorrow. Today got unexpectedly busy.  But, it happened again today (that’s twice now today)

    SYSTEM
    2020-01-12 13:48:22
    Wireless Protection
       
    Successfully sent config to AP [P210018WVRKDY75].
    18007
    SYSTEM
    2020-01-12 13:47:48
    Wireless Protection
       
    [MASTER] sending notification about offline AP P210018WVRKDY75
    18006
  • Again

    SYSTEM
    2020-01-13 00:52:16
    Wireless Protection
       
    Successfully sent config to AP [P210018WVRKDY75].
    18007
    SYSTEM
    2020-01-13 00:51:35
    Wireless Protection
       
    [MASTER] sending notification about offline AP P210018WVRKDY75
    18006
  • The switch port the AP is connected to is bouncing when these entries occur in the firewall log. No errors on the switch.  Appears the AP is actually reloading but I’ll move switch ports later just in case and the replace the cable although this happens to the other AP as well, it doesn’t happen near as frequent. This the the log from my switch:

    13 Jan 2020 00:52:07%STP-W-PORTSTATUS: g17: STP status Forwarding
    13 Jan 2020 00:52:02%LINK-I-Up: g17
    13 Jan 2020 00:52:00%LINK-W-Down: g17
    13 Jan 2020 00:51:52%STP-W-PORTSTATUS: g17: STP status Forwarding
    13 Jan 2020 00:51:47%LINK-I-Up: g17
    13 Jan 2020 00:51:45%LINK-W-Down: g17
    13 Jan 2020 00:51:01%STP-W-PORTSTATUS: g17: STP status Forwarding
    13 Jan 2020 00:50:56%LINK-I-Up: g17
    13 Jan 2020 00:50:46%LINK-W-Down: g17
  • I moved from g17 to g18 on the switch FYI 

  • Ian,

    I went back at your original suggestion regarding DHCP and noticed the lease settings for the APs are 24 hours and correlate to the times they go offline.  Would it be better to statically reserve these or elongate the lease period? Or is there something unexpected going on here? I would not expect the AP to lose connection when it renews it’s IP lease.  Come to think of it, the WAN interface on the XG sometimes does the same thing.