Wi-Fi -RADIUS authentication doesn't failover to the secondary server

Hey All,

I'm having a peculiar issue where Sophos XG fails to detect the primary RADIUS server is offline and finally failover to the secondary server for wireless authentication.

Configuration

  • XG135 (SFOS 17.5.7 MR-7.HF062020.1)
  • RAD-1 - Server 2016 (Member Server, NPS) - Running on ESXI (On-Site)
  • RAD-2 - Server 2016 (Member Server, NPS) - Running in Azure
  • Client Machine (Windows 10 Pro)

With either server individually set as the primary RADIUS authentication server, I can connect to the Wi-Fi network with a client machine no problem at all.

Testing Failover

To simulate a failure of the premise RAD-1 server, I went into ESXI and proceeded to suspended the machine and then attempted to connect to the Wi-Fi using the same client machine. The behavior observed on the client machine is I continue to receive "Can't connect to this network".

Multiple attempts yield the same result even after 10 to 15 minutes.

If I manually set the RAD-2 as the primary while the RAD-1 system is still suspended, the client machine is able to connect as expected.

I've been playing around with this for a while now and am not sure what's going on here. Packet captures show Sophos sending out ARP-NDP requests to the RAD-1 IP address but never receives a response which is expected.

This issue has been observed on both this XG135 and a XG310 I deployed at another site with a similar configuration for Wi-Fi.

Any ideas on what might be going on here?