This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

9.400 bricks connected APs!

Hello,

just a heads up warning regarding the soft released 9.400.


Just installed it on the gateway and all connected APs that were using VLAN tagging and bridging communication to VLANs got bricked by the AP firmware update.

If the AP is not using VLAN tagging and wireless networks just use bridging to AP LAN then it works. If VLAN tagging is used then after the firmware update the AP never finishes booting and gets stuck while trying to get IP address from DHCP in an never ending loo (AP50). Got a confirmation from a customer where it bricked their AP30s. For AP50 and 30 it is possible to use the flash tool to get them back to working state. Unfortunately AP55c gets bricked totally as there is no flash tool for it and after it gets the 9.400 firmware update and is deployed with VLAN tagging enabled it basically dies and never finishes booting or does not even get to a point of requesting the IP address from DHCP and gets stuck rebooting over and over again.

After rolling back to 9.355 (restore from tape) and reflashing the APs using the recovery tool (when possible) wireless works fine.

Looks like some nasty bug that slipped through QA... again :(



This thread was automatically locked due to age.
Parents
  • Hello Zdenek,

    we just did some investigation regarding this regression you mentioned, and indeed there is a regression, but it is a bit different then you think it is. First of all the AP is probably not bricked (even if it looks like). What is broken is the fallback mechanism, which is probably used in your setup as default due to the connecting VLAN being untagged (coming later to it). When you configure your AP with the VLAN tagging it tries to connect the UTM over the specified VLAN, if it can not do so it will after some time fallback to a default LAN behaviour this means it contacts the UTM without using a VLAN tag. And this fallback is currently broken in the 9.400.

    If you use this vlantagging option and meanwhile configure your switch to use the specified VLAN with 'untag', the AP will first try to connect the UTM with the specified VLAN. This leads to the packets going out from the AP with the vlantag, the switch will forward them to the UTM, but as soon as the answer from the UTM comes to the switch it will get the vlantag removed. Thus the AP is not able to match the answer (without VLANTAG) to his request (with VLANTAG), then after some time the AP will go to the fallback and work in this fallback mechanism, the bridge 2 VLAN networks are not effected by this fallback, so they still work as expected, it is just the way the AP contacts the UTM.

    We are working on the fix for this regression, but for now if you want to get the APs back running which don't show up anymore you could provide the VLANTAG as 'tagged' from the switch to them, then they should come up working again.

    Regards,
    Emanuel

  • Thanks guys!  I had what appeared to be the exact problem that Zdenek described after upgrading to 9.400 this morning.  My AP55C just never came back online from the UTM's perspective.  I could look in the DHCP logs and see:

    2016:04:04-13:30:41 utmname dhcpd: DHCPDISCOVER from 00:1a:8c:xx:xx:xx (AP55C-A1234567890123) via 172.16.xxx.xxx
    2016:04:04-13:30:41 utmname dhcpd: DHCPOFFER on 172.16.xxx.xxx to 00:1a:8c:xx:xx:xx (AP55C-A1234567890123) via eth4.xxx

     Hundreds and hundreds of these lines.  I am using a Cisco switch.  My original configuration in the switch was:

    UTM switch port - Trunked with native VLAN set to my management VLAN number (123).

    AP switch port - Trunked with native VLAN set to my management VLAN number (123).

    Taking your advice, I removed the native VLAN configuration from both the UTM and AP switch ports and everything started working immediately.  Thanks and I hope this helps for anyone else who is having a similar problem.

  • Hi,

    So does that mean you're not doing any 'bridge to VLAN' wireless networks?

    Thanks,

    Barry

  • Hi Barry,

    Sorry, should have been a little more descriptive - I get used to talking to customers all day, sometimes! LOL!

    So, I do have three Wireless Networks on that AP, all three configured "Bridge to VLAN".  So, all of my wireless VLAN traffic between the UTM and AP was obviously being tagged, but my management traffic wasn't being tagged (native - in Cisco speak).  Essentially the only thing I really ended up having to change was to adjust my management VLAN traffic to be tagged between the UTM and AP as well.

    Chase

  • ChaseDavenport said:

    Hi Barry,

    Sorry, should have been a little more descriptive - I get used to talking to customers all day, sometimes! LOL!

    So, I do have three Wireless Networks on that AP, all three configured "Bridge to VLAN".  So, all of my wireless VLAN traffic between the UTM and AP was obviously being tagged, but my management traffic wasn't being tagged (native - in Cisco speak).  Essentially the only thing I really ended up having to change was to adjust my management VLAN traffic to be tagged between the UTM and AP as well.

    Chase

    So at the moment on the switch port the AP is connected to you have the management VLAN both as native and tagged? Because that is the configuration we are using and which stopped working with 9.400.

    The presence of the management VLAN on the switch port the AP is connected to both as tagged and untagged (native/PVID) is critical in order to allow seamless deployment of APs without having to have some special port I have to connect the AP to first, let it upgrade the firmware and download config and after this is done to actually unplug it and connect it to the final port. Right now this functionality is broken (as it was officially confirmed).

    Zdenek

  • Hi Zdenek,

    Are you saying we SHOULD or SHOULD NOT be using PVID (and tagging) with 9.400?

    FWIW, my UTM is in VMWare ESXi so I don't think PVID is going to help, but I have tried both VLAN1 and VLAN13 on the PVID setting on my switch.

    I thought about plugging the AP30 directly into another NIC on the ESXi server, but then will bridge-to-VLAN still work?

    Thanks,

    Barry

  • BarryG said:

    Hi Zdenek,

    Are you saying we SHOULD or SHOULD NOT be using PVID (and tagging) with 9.400?

    FWIW, my UTM is in VMWare ESXi so I don't think PVID is going to help, but I have tried both VLAN1 and VLAN13 on the PVID setting on my switch.

    I thought about plugging the AP30 directly into another NIC on the ESXi server, but then will bridge-to-VLAN still work?

    Thanks,

    Barry

    From experience in our setup, when the management VLAN is both tagged and untagged on the port the AP is connected to (how VLANs are delivered to the UTM itself is not important), the connected AP ends up in the infinite DHCP request/offer/never ack loop (which has been confirmed by Sophos as a bug in AP firmware).

    As Chase mentioned in his post, it seems that if you remove the untagged management VLAN (native/PVID) from the port the AP is connected to and keep it there as tagged only (or at least that is how I understood what he did), the AP starts working (which I can't confirm at the moment as I'm not in the office this week).

    As for the plugging of AP directly to ESXi NIC, well, I don't think it makes any difference, plus, based on your ESXi configuration, the bridge to VLAN functionality may not work as expected afterwards, so I would generally recommend to keep the AP plugged into a physical switch.

Reply
  • BarryG said:

    Hi Zdenek,

    Are you saying we SHOULD or SHOULD NOT be using PVID (and tagging) with 9.400?

    FWIW, my UTM is in VMWare ESXi so I don't think PVID is going to help, but I have tried both VLAN1 and VLAN13 on the PVID setting on my switch.

    I thought about plugging the AP30 directly into another NIC on the ESXi server, but then will bridge-to-VLAN still work?

    Thanks,

    Barry

    From experience in our setup, when the management VLAN is both tagged and untagged on the port the AP is connected to (how VLANs are delivered to the UTM itself is not important), the connected AP ends up in the infinite DHCP request/offer/never ack loop (which has been confirmed by Sophos as a bug in AP firmware).

    As Chase mentioned in his post, it seems that if you remove the untagged management VLAN (native/PVID) from the port the AP is connected to and keep it there as tagged only (or at least that is how I understood what he did), the AP starts working (which I can't confirm at the moment as I'm not in the office this week).

    As for the plugging of AP directly to ESXi NIC, well, I don't think it makes any difference, plus, based on your ESXi configuration, the bridge to VLAN functionality may not work as expected afterwards, so I would generally recommend to keep the AP plugged into a physical switch.

Children
  • On my Netgear GS108T switch, I cannot disable PVIDs, but I can set them to a different VLAN (for the AP30 and ESXi host). That doesn't seem to help.

    Is there an ETA for a fix from Sophos?

    Thanks,

    Barry

  • Hi,

    we have a fix for this issue and it will be available shortly. Sorry for the inconvenience.

    Kind regards,

    Dirk Bolte

  • Hi Barry,
    to get your APs back running you should configure your switch port for the AP30 to have the PVID on another VLAN (which is not used at all anywhere else also not on the ESXI host) and still configure the VLAN which was configured on UTM in "Wireless Protection" -> "Access Points" -> "vlan tag" as tagged for the AP30. Depending on how you configured the interface in the UTM (with vlantag on top or without) you need to use this VLAN as tagged or PVID on the ESXI host.

    Regards,
    Emanuel

  • Hello Dirk,

    how could we get the wifi fix what you mentioned, or how far we are from v9.4 GA?

    alda

  • Please understand that I cannot communicate dates. The fix is part of 9.401 which will be available shortly.

  • Hello Dirk,

    I do not know if I understood correctly. Do you have a plan to relase the version v9.400 with documented serious bug in the WiFi module and after that you will  release an update 9.401 that will fix this bug?

    Really? And it seems to you correct? And what if someone installs the bug version v9.400 and does not install patch v9.401 and his WiFi infrastructure collapses?

    I'm not sure if this plan is perfect for your customers and partners ....


    alda

  • its not a plan.. they did it.. just with the "soft-release" of 9.400..

    so you see you as an early adopter who wants to use the new features will be used as beta testers as usual..

    dont install early releases from sophos is a golden rule since many months now..

    wait for GA Release or minimum two weeks since soft-release to see what the beta testers find out...

    greets

    zaphod
    ___________________________________________

    Home: Zotac CI321 (8GB RAM / 120GB SSD)  with latest Sophos UTM
    Work: 2 SG430 Cluster / many other models like SG105/SG115/SG135/SG135w/...

  • Hi,

    The up2date package for 9.401 directly goes from 9.35 to 9.401. Thereby the APs are not exposed to the bug. 9.400 will be removed from up2date, so no-one should be able to install it anymore and thus not run into the bug anymore.

    I hope this addresses your concern.

    Kind regards,

    Dirk

  • Hello Dirk,

    I think that is the correct solution.


    So we could enjoy this update this Thursday, isn't ?


    alda

    ;-)

  • 9.401 Update is out folks....

    CTO, Convergent Information Security Solutions, LLC

    https://www.convergesecurity.com

    Sophos Platinum Partner

    --------------------------------------

    Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries.  Use the advice given at your own risk.