This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Central Wireless RADIUS: Roaming Clients between APX320 and APX530 lose connection when on APX530

Hi,

we notice a strage issue for a while now when WiFi clients are connected to a 802.1X RADIUS WiFi that contains APX320 and APX530 APs that are managed in Sophos Central. Occasionally it happens, when they move in the office and roam between the two models, that they lose network connectivity and the new AP is not authenticating the roamed client against the RADIUS Server. At the RADIUS server we see no authentication - neither failed or success - happening.

We have the feeling that is is only related to the APX530 - The clients have no connection, when they are connected to APX530 with 5GHz. Central still shows an IP Address for the client device, but the device actually has no IP on it's network adapter. When we reboot the APX, it usually works again. Fast roaming is enabled.
The users help themselves by turning their WiFi adapter of and on when they face the issue. It will not workaround immediately, they usually need some attempts until it may fix the issue. When it finally works, they are still connected to the APX530 as it is the nearest AP then. On the RADIUS we will then also see that authentication happened and was successful.

WiFi AP have Central firmware v2.3.4-5

There are only a few clients connected. No real load for the machines.

I'd like to know if there are known issues with that or if you have a hot idea how to debug on this.



Added TAGs
[edited by: Erick Jan at 5:02 AM (GMT -8) on 12 Jan 2024]
  • it was worth a try. I opened a support case and hope they can figure it out 06845353 

    we noticed it happens also when roaming between APX320.

  • I spent some time in digging into the debug files and dumps I provided to Sophos

    A continous Ping was running from the client 192.168.1.68  to 193.99.144.80. Malformed Packet is probably a Wireshark issue.

    this in the APX packet capture is the point, where the problem is happening - Client roaming from one APX to an other:
    30051   2023-07-20 21:53:25,682674      IntelCor_27:eb:5b          Sophos_3c:7c:15 802.11 46771   37008   313            Reassociation Request, SN=8, FN=0, Flags=........, SSID="mySSID"[Malformed Packet]

    That is then the last communication in the client tcp dupmp:
    80        2023-07-20 21:53:22,027795      192.168.1.68     193.99.144.80   ICMP                            74         Echo (ping) request  id=0x0001, seq=6505/26905, ttl=128 (no response found!)
    then here the client decided to fail over to the APIPA address the first time:
    343       2023-07-20 21:54:02,161397      169.254.186.89 224.0.0.22         IGMPv3                        54         Membership Report / Leave group 224.0.0.251

    The times do not match 100% but are near enough to see what happens.

  • Hi LHerzog,

    Thank you for sharing the case. We’re following the case and will update the thread.

    As an update from the case, the case handler is waiting for availability from your side for a remote session.

    Erick Jan
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • what I can see in the WLAN reports from Windows at the time the issue is recreated is:

    Funktionsänderung für "{8a3098e1-e242-4933-b6b2-fc7ea64e573c}" (Familie (0x47008000000000): v4, Funktion: Kein, ChangeReason: NoAddress)

    In english like:

    Capability change on {8a3098e1-e242-4933-b6b2-fc7ea64e573c} (0x47008000000000 Family: V4 Capability: None ChangeReason: NoAddress)

    If you search for that there is quite a lot discussion about that on intel. But not on Sophos.

    At the time the disconnect starts, in eventlog there is a driver event

    7003 - Roam Complete

    That is the latest Intel driver used on the affected test computer. Of course this is not the only notebook and WiFi model with that issue.

    Device: Intel(R) Wireless-AC 9560 160MHz
    
    PNP ID: PCI\VEN_8086&DEV_9DF0&SUBSYS_00348086&REV_30\3&11583659&0&A3
    
    Guid: {8A3098E1-E242-4933-B6B2-FC7EA64E573C}
    
    Current driver version: 22.230.0.8
    
    Driver date: 5-9-2023
    
    DevNode flags: 0x180200a

    As we use mostly Intel WiFi endpoints, I think Sophos will point the finger at Intel. Intel at the device manufacturer and so on.

    iPhones also had this connection issue. They also use that 802.1X SSID.

    I'm not interested in blaming, I 'd like to know if we can change APX WiFi settings that act like a workaround.

    As written earlier, on our grown AP55C infrastructure this has never been an issue.

  • UPDATE FROM LHERZOG about this case/thread

    News from our support case 06845353 as reference in case other users may face such issues.

    Sophos WiFi Dev Team has successfully recreated the issue.

    • When a client roams from AP1->AP2->AP1, we observe that the client traffic is not getting tagged properly with VLAN, which results in the client not getting the IP address. 
    • When Fast roaming is enabled, radius authentication will be skipped when the client roams.
    • The client will receive the correct IP address after turning wifi on and off. This is because the client will go through the complete radius authentication where VLAN information will be captured through radius exchanges.
    • Disabling the fast roaming is a possible workaround for this issue for now but it comes with the cost of roaming delay.

     
    Emmanuel (EmmoSophos)
    Technical Team Lead, Global Community Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
  • UPDATE FROM LHERZOG about this case/thread

    After further investigation by the development team for CWIFI-13163 “Fast roaming issues - clients lose IP and need to reconnect,” there is an issue with the Wi-Fi driver on the APX that is preventing the VLAN ID from syncing when fast roaming is enabled with dynamic VLANs. When the wireless client moves to a new access point, the packets coming through the new access point are not tagged with the correct VLAN ID. This results in the clients losing their IP address and must reconnect to the access point. To work around this, we recommend disabling fast roaming across the wireless network, this will allow wireless clients to roam between the access points with the correct dynamic VLAN ID. The other way is to set static VLAN assignments instead of using dynamic VLANs on the wireless network and enable fast roaming.


     
    Emmanuel (EmmoSophos)
    Technical Team Lead, Global Community Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.