XG managed APX 740 APs randomly going offline and dropping all clients

Feature and severity: I have a bug (appears to be) with SFOS v18 v3 and APX 740 wireless access points that I consider moderately impacting.

Summary: I am unsure of the trigger, however, every now and then (appears random but multiple times per day) all 3 of my APX 740s “appear” to go offline then come back a minute or two later.

Observed behavior: All 3 APs drop all clients and the SSID isn’t broadcast then after two or three minutes they come back. All the clients need to re-home to the best AP again. I run 3 740s using an XG210 as the controller and have 3 SSIDs (one for only 2.4ghz, one for only 5ghz and one for a guest SSID that’s 2.4 + 5ghz). I thought it may be related to auto channel selection so I manually set the channel on both 5ghz and 2.4ghz radios on all APs (different channels of course). The problem persisted though. I don’t recall having this issue previously but it may have been happening without me being aware. I say that because I’ve recently added significant home automation devices so now it’s very noticeable when this happens.

i tried Sophos central wireless and it’s worse. I won’t go back to central until several releases come out.

Reproduce it: This happens on its own many times a day but nothing that forces it that I’m aware of.

Supporting logs: The log viewer, under “SYSTEM” shows (just a brief excerpt for brevity):

SYSTEM	2020-01-05 14:13:20	WirelessProtection	[MASTER] sending notification about offline AP P210018WVRKDY75	18006
SYSTEM	2020-01-05 13:00:20	WirelessProtection	Successfully sent config to AP [P210018V7XJJH9F].	18007
SYSTEM	2020-01-05 13:00:05	WirelessProtection	Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-05 12:59:48	WirelessProtection	[MASTER] sending notification about offline AP P210018V7XJJH9F	18006
SYSTEM	2020-01-05 12:59:27	WirelessProtection	[MASTER] sending notification about offline AP P210018WVRKDY75	18006
SYSTEM	2020-01-05 12:57:29	WirelessProtection	Successfully sent config to AP [P210018V7XJJH9F].	18007
SYSTEM	2020-01-05 12:56:49	WirelessProtection	[MASTER] sending notification about offline AP P210018V7XJJH9F	18006
SYSTEM	2020-01-05 12:56:05	WirelessProtection	Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-05 12:55:25	WirelessProtection	[MASTER] sending notification about offline AP P210018WVRKDY75	18006

Parents

0 suzzyx over 4 years ago

Hello Jamie,

can you please specify wich fixed channels do you selected?

Kind Regards,

Suzzyx
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to suzzyx

Hi Suzzyx,

for 2ghz (3 APs):

1, 6, 11

for 5ghz (2 APs):

36, 44

side note: I could re-enable auto selection (I only ever used auto on 5ghz) and see if it happens. Still no new AP offline entries in the log.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Dawg13 over 4 years ago in reply to Dawg13

Again:

SYSTEM	2020-01-10 16:39:38	Wireless Protection	Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-10 16:39:05	Wireless Protection	[MASTER] sending notification about offline AP P210018WVRKDY75	18006
SYSTEM	2020-01-08 12:31:41	Wireless Protection	Successfully sent config to AP [P210018V7XJJH9F].	18007
SYSTEM	2020-01-08 12:31:10	Wireless Protection	[MASTER] sending notification about offline AP P210018V7XJJH9F	18006
SYSTEM	2020-01-08 11:59:51	Wireless Protection	Successfully sent config to AP [P210018V7XJJH9F].	18007

0 Dawg13 over 4 years ago in reply to Dawg13

Memory use up to 53% now as well. Maybe unrelated but worth mentioning with the unexpected reboot 3 days ago.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 LuCar Toni over 4 years ago in reply to Dawg13

Could you show us a little network diagram with your APs?

Are you using VLANs or are the APs directly attached?

__________________________________________________________________________________________________________________
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Dawg13 over 4 years ago in reply to LuCar Toni

Happened again:

SYSTEM	2020-01-12 04:05:18	Wireless Protection	Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-12 04:04:44	Wireless Protection	[MASTER] sending notification about offline AP P210018WVRKDY75	18006
SYSTEM	2020-01-11 08:29:30	Wireless Protection	Successfully sent config to AP [P210018V7XJJH9F].	18007
SYSTEM	2020-01-11 08:26:51	Wireless Protection	Successfully sent config to AP [P210018V7XJJH9F].	18007

0 Dawg13 over 4 years ago in reply to LuCar Toni

Yes I can. It’s a small test network with roughly 50 devices, most wireless. I’m not using VLANs although I have a test VLAN configured on all 48 switch ports (tagged) and a VLAN interface on the firewall but it’s not used (it is configured on the only LAN interface I have however). The DHCP network for that VLAN is turned off as well. Much of this was in place to test central wireless. I’ll put a straw diagram together today and post.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Dawg13 over 4 years ago in reply to Dawg13

didn’t get a chance to toss a diagram together. I will tomorrow. Today got unexpectedly busy. But, it happened again today (that’s twice now today)

SYSTEM	2020-01-12 13:48:22	Wireless Protection			Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-12 13:47:48	Wireless Protection			[MASTER] sending notification about offline AP P210018WVRKDY75	18006

0 Dawg13 over 4 years ago in reply to Dawg13

Again

SYSTEM	2020-01-13 00:52:16	Wireless Protection			Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-13 00:51:35	Wireless Protection			[MASTER] sending notification about offline AP P210018WVRKDY75	18006

0 Dawg13 over 4 years ago in reply to Dawg13

The switch port the AP is connected to is bouncing when these entries occur in the firewall log. No errors on the switch. Appears the AP is actually reloading but I’ll move switch ports later just in case and the replace the cable although this happens to the other AP as well, it doesn’t happen near as frequent. This the the log from my switch:

13 Jan 2020 00:52:07%STP-W-PORTSTATUS: g17: STP status Forwarding

13 Jan 2020 00:52:02%LINK-I-Up: g17

13 Jan 2020 00:52:00%LINK-W-Down: g17

13 Jan 2020 00:51:52%STP-W-PORTSTATUS: g17: STP status Forwarding

13 Jan 2020 00:51:47%LINK-I-Up: g17

13 Jan 2020 00:51:45%LINK-W-Down: g17

13 Jan 2020 00:51:01%STP-W-PORTSTATUS: g17: STP status Forwarding

13 Jan 2020 00:50:56%LINK-I-Up: g17

13 Jan 2020 00:50:46%LINK-W-Down: g17

0 Dawg13 over 4 years ago in reply to Dawg13

I moved from g17 to g18 on the switch FYI
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to Dawg13

Ian,

I went back at your original suggestion regarding DHCP and noticed the lease settings for the APs are 24 hours and correlate to the times they go offline. Would it be better to statically reserve these or elongate the lease period? Or is there something unexpected going on here? I would not expect the AP to lose connection when it renews it’s IP lease. Come to think of it, the WAN interface on the XG sometimes does the same thing.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Dawg13 over 4 years ago in reply to Dawg13

Ian,

I went back at your original suggestion regarding DHCP and noticed the lease settings for the APs are 24 hours and correlate to the times they go offline. Would it be better to statically reserve these or elongate the lease period? Or is there something unexpected going on here? I would not expect the AP to lose connection when it renews it’s IP lease. Come to think of it, the WAN interface on the XG sometimes does the same thing.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Dawg13 over 4 years ago in reply to Dawg13

No symptoms or log messages since I statically reserved the IPs in DHCP. Last event was 1/13. Still too soon for me to be "comfy" though.

Network drawing (basic):
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 rfcat_vk over 4 years ago in reply to Dawg13

Hi James,

at one stage there was an issue with the DHCP server, so I blew all my lease times out and I do make my IPs use a static assignment.

The WAN interface sounds like your ISP might be performing network maintenance, the changes occur at night?

Ian

I do not have any of the APX series APs only the previous models.

XG115W - v20 GA - Home

XG on VM 8 - v20 GA

If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to rfcat_vk

I found the culprit on the WAN side and it turns out there's a node problem in the neighborhood. so, you're right, it's carrier related but it was one of those deals where they didn't know there was a problem in their infrastructure.

I have not had an issue since I statically reserved the IPs for the APX APs but I'd like to see a few more days go by to feel "good" if that makes sense. I find it odd that statically reserving solves this because it's still a renew, just ensuring the same IP. Seems it's masking an underlying issue. I may do the same as you across the DHCP scope.

Just my 2 cents right now.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to Dawg13

Still solid uptime since the static reservations, but memory is now at 78%. Concerning albeit unrelated to the AP issue. Uptime is 10+ days. Starts around 36% at boot. I’ll watch it but worried there’s a memory leak at play.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 Dawg13 over 4 years ago in reply to Dawg13

Just happened to my disappointment. There must be some logging on the access points that can be looked at. I do not have power supplies for these access points (they doing ship with them) but I would expect a PoE related log entry on my switch. Otherwise, I’d move them to the ports directly on the XG to rule out the switch. I would imagine Sophos would want to get to the bottom of this but it doesn’t feel like it. May consider rolling back to 17 and see if this follows as I’m running out of options. Keep in mind, no other devices are doing this. At first it was all access points. Now, it’s only the access point that’s running the 5ghz SSID. A while back I removed the 5ghz SSID from the other APs because devices weren’t rehousing properly and I’d prefer the outage sadly.

System logs from the XG:

	Time	Log comp	Status	User name	Message	Message ID
SYSTEM	2020-01-18 00:33:35	Wireless Protection			Successfully sent config to AP [P210018WVRKDY75].	18007
SYSTEM	2020-01-18 00:33:01	Wireless Protection			[MASTER] sending notification about offline AP P210018WVRKDY75	18006

Switch logs show only that port bouncing (and this is a new port and cable to the wall jack):

18 Jan 2020 00:33:38%STP-W-PORTSTATUS: g18: STP status Forwarding

18 Jan 2020 00:33:33%LINK-I-Up: g18

18 Jan 2020 00:33:31%LINK-W-Down: g18

18 Jan 2020 00:33:23%STP-W-PORTSTATUS: g18: STP status Forwarding

18 Jan 2020 00:33:19%LINK-I-Up: g18

18 Jan 2020 00:33:16%LINK-W-Down: g18

18 Jan 2020 00:32:32%STP-W-PORTSTATUS: g18: STP status Forwarding

18 Jan 2020 00:32:27%LINK-I-Up: g18

18 Jan 2020 00:32:17%LINK-W-Down: g18

no errors on the switch port (3rd column is error received count):

g18

15828947

30240

22497987

0 rfcat_vk over 4 years ago in reply to Dawg13

Hi James,

you are describing an overheating issue. Try turning the 2.4ghz SSID on the failing AP off.

Ian

XG115W - v20 GA - Home

XG on VM 8 - v20 GA

If a post solves your question please use the 'Verify Answer' button.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to rfcat_vk

I’ll give it a shot. I’ll do it today. Bit strange that using both radios causes them to overheat (I understand your point though). The APs are in a well cooled area (sitting on a furniture surface - that’s it). Given that all the APs were doing this when they all had both SSIDs enabled and now only the AP that has both radios enabled is doing this, you could be on to something. That’s a design flaw to me.

But, as I said, it’s worth a try to identify the source. I’ll report back. Thx Ian.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to Dawg13

Done. 2 APs with only 2.4ghz and one AP with only 5ghz
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Dawg13 over 4 years ago in reply to Dawg13

Quick update:

10 days of uptime and frankly better AP Association by endpoints. I’m going to call it identified and remediated after a month but not resolved as this is a product defect in my opinion. This happened with the TX power turned down, too. Honestly, the access point that is now dedicated to just 5ghz is running warm to the touch. I now can’t run guest on all access points as 2+5 and I need additional APs to fill in 5ghz coverage gaps now. Nevertheless, it would appear you were correct, Ian.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel