Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

Sophos Firewall: v20.0 MR1: Feedback and experiences

Release Post:  Sophos Firewall OS v20 MR1 is Now Available 

The old V20.0 GA Post:  Sophos Firewall: v20.0 GA: Feedback and experiences  

To make the tracking of issues / feedback easier: Please post a potential Sophos Support Case ID within your initial post, so we can track your feedback/issue. 

Release Notes:  https://docs.sophos.com/releasenotes/output/en-us/nsg/sf_200_rn.html 

Important Note on EOL Sophos RED Support:

The legacy EOL RED 15, RED 15w, and RED 50 are not supported in v20 MR1. Customers using these devices should upgrade to SD-RED or a smaller XGS appliance before upgrading to MR1 to maintain connectivity. See the following article for details: Sophos RED: End-of-life of RED 15/15(w) and RED 50



Adding
[bearbeitet von: LuCar Toni um 10:50 AM (GMT -7) am 16 May 2024]
Parents
  • HA seems broken in this release whether its an upgraded unit or brand new, tested on several deplyments.
    The HA setup seems to work okay and testing failover but there is no way (https, SSH etc.) to get the aux unit when its the secondary.
    Even plugging the firewalls directly via eth port in each other cannot ping aux from HA MGMT interface.
    Trying put units in Mercedes stadium and now going off an a support tangent, weeeeeeee 

  • It is known, that you cannot reach the AUX appliance "through" the Primary, if that is what you trying to do. 
    But being in the same subnet, you should be able to resolve the AUX perfectly fine. 

    __________________________________________________________________________________________________________________

  • You are saying its a known problem or that's the way it works now?


    Here is the problem in full HA environment including redundant firewalls and switches. Let's say I have VLAN and I have subnet 192.168.10.0/24.

    The MGMT for HA interfaces and a laptop are on that subnet. From the laptop I ping 8.8.8.8 -t

    Works fine until there is a failure/fail over. At that time no matter what is done I can still get to the MGMT interfaces but not out to the internet the packets never come back presumably because something is goofed up with the OS. I can see traffic going out but the responses get dropped like it's sending the packet to the HA interface instead of routing them through the gateway on the same subnet. This has been tested about 50 times with the same result.

    This leaves us with making the HA its own subnet in this case 172.14.24.0/29 that include no other wired machines. What this means is we are asking clients if they want to get to the Aux device they have to take a laptop go the site, make sure they are in the same VLAN/port and assign something in the 172 range just to get on aux the device? Or perhaps have a dedicated machine just so they can get to the aux unit?

    All this because the firewall can't route those packets to 172 address space. It just ignores the NAT policy which means something is broken.
    It just doesn't make any sense at all that the primary itself cannot ping the aux device?

    This may sound okay when you have one client but what happens when you have a thousand with many locations?

    Is this being investigated or just accepted.

    Even if I open a ticket the chances of trying to explain this without major escalation is going to be low.

    thanks for the response Lucar


     



  • I did answer to another point of the story. (The webadmin /SSH component of AUX). 

    Lets rephrase your points here: 

    It works fine if Appliance A is primary and Appliance B is Aux. 
    In Case of a Failover, you cannot reach the internet anymore. 

    SFOS and UTM uses the same principle of virtual MACs. https://docs.sophos.com/nsg/sophos-firewall/20.0/help/en-us/webhelp/onlinehelp/HighAvailablityStartupGuide/AboutHA/HAArchitecture/index.html

    This means, if you do a failover, the AUX tries to take over the MACs of the Primary. There are instances, where the ISP (Switch or ISP Modem) are not allowing this maneuver. Meaning: The takeover was successful, but the Appliance B as a Primary does not get any kind of ARP replies to its new virtual MAC. 

     

    Btw by rereading your points here, i am still not sure, if i fully understand what you mean. Why would you route some traffic over the AUX Appliance in any case? I do not fully understand the points, why you change the subnet range. Because the above is something, Sophos is seeing for years. It is simply the ISP switch/router not accepting the failover. 

    __________________________________________________________________________________________________________________

  • I would love to have everything on the same subnet for this purpose but I can't tell the client "it will work until there is a failure and then you can't get to the Internet",, assuming the HA and the wired network are the same subnet.

    I am not routing over those interfaces I just need to a way to allow the site admin to get the aux device.


    Right now my choices are use a separate IP address space for the MGMT interfaces and get out to the internet when there is failure or use the same IP subnet as the wired network that allows access to the aux device but then can't get out to the internet if there is a failure, both choices are garbage.

    Assuming lots of remote action that means we now have to have a machine dedicated per site just to get to the aux device.
    I mean this can't be logical to anyone.

    We cannot act like "oh just change the IP on one of the machines" when everything is remote and there are 100s of sites.

    Everything is thought of like we are sitting in a lab but for the most part that is not the case at all in the real world.

    Right?

  • Lets take a step back: 
    What is your problem? Is it only about "I need to access the AUX webadmin/SSH"? 

    Why do you want to access the AUX appliance in the first place? 

    Because in a normal setup, if you connected to a Network, you can have the AUX be part of that network. It is called Peer Administration IP and for most customers that is fine (if they even access the AUX in the first place). 

    __________________________________________________________________________________________________________________

  • Yes

    I understand how its supposed to work and how it has worked in the past. I understand the peer administration what I am typing here is not being read correctly.

    In the past we would make the HA MGMT interfaces part of the normal wired VLAN and network. From there you could get on to either unit primary or aux and manage, worked great.

    I dont how else I can say this other than what I have already said several times. its a detailed as I can possibly make it.

    When we use the normal wired network for the MGMT interfaces and there is any failure of switch or firewall. The firewall sends all traffic from wired network to the MGMT interface which means no one on the wired network can get to the internet. Entire thing melts down and doesnt work correctly anymore for V20.x new or firewalls that have been updated to V20.x

    Therefore the only other method is to put those interfaces on dedicated network. When that happens you have no way to get from the wired network to get to the aux device.

    There a lot of reasons to get on the aux device, many of which customers are already used to. That is not the point, the point is its designed to be connected and accessible and it doesnt work properly in version 20.

    We have these deployed in very complicated networks that require compliance. Its really hard to say "sorry Sophos doesnt work like its supposed to add another network so  you can access the aux firewall". "Setup a machine thats on the same subnet so you can access etc. Its hacky.

    I'll open a ticket but I have no hope unless its escalated to someone that really understands whats going on.

    My replies take 6 hours to come through, they are moderated.

  • Ok. I am really sorry, but this is not how it suppose to work at all. And nobody else reported this kind of behavior change in the past with SFOSv20. 

    I am not quite sure, why the firewall would send anything to a Management Interface instead of the normal path - It sounds like, in your example, it completely ignores every routing decision. 

    Lets take your post here: 

    When we use the normal wired network for the MGMT interfaces and there is any failure of switch or firewall. The firewall sends all traffic from wired network to the MGMT interface which means no one on the wired network can get to the internet. Entire thing melts down and doesnt work correctly anymore for V20.x new or firewalls that have been updated to V20.x

    I was looking and trying it right now - If i force a takeover for example by rebooting the primary appliance, my AUX Appliance will take over and process the traffic normally.
    I am connected to the Port1, which is also the management port and from there, i can also access the AUX appliance normally. 

    The bottom line of your post is: If you have a failure of a switch or the firewall itself, the entire network is not working anymore and not routing the traffic to the WAN anymore? Do you have any kind of wireshark dump of this situation at hand? And how do you resolve this situation, if this occurs? 

    Because still, i think, what you experience, is a VMAC situation on WAN. Its the AUX takes over and does not get any ARP Replies from the WAN. I saw this frequently in the past years - But this is not related to V20.0. 

    And again: You are saying on top, you are talking about the entire network is going down, and later in your post, you are saying, you cannot reach the AUX appliance webadmin, which are different stories. The AUX Appliance always presents the webadmin and SSH via its physical MAC, while the Primary works with the virtual MAC. 

    __________________________________________________________________________________________________________________

  • I totally agree thats why I posted it here. Most will probably not know this who is checking this all the time? If the failover works you would not think about it but it doesnt mean its working correctly for aux access.

    Dont need wireshark I can see it directly in the firewall logs.

    I doubt you can mimic the setup, yes of course a simple firewall fail over works with one network. This happens with several or many VLANs, the interfaces LAG setup with redundant HA switches.

    That said once the failure happens and it stops traffic to the internet even if I plug a cable right in to the firewall on the same subnet as the MGMT interfaces I cannot get out. In the logs it shows the traffic going right to the MGMT interface and dying.
    All other VLAN an subnets work great even after failure it only kills the subnet that the HA MGMT interfaces are on. 

    The issue here is these are customer environments not playgrounds. If I thought I could get the right support person I would spend the time to go through this as it takes a lot of time to keep breaking HA, resetting and so on from the beginning.

    But we have other examples too. Firewall works as expected in service for years, Upgrade to V20 this happens to it as well.
    In this case its actually a management network so it doesnt have internet but certainly we should be able to get to the MGMT interface. 
    Look at the screen shot this has been in service for years and now does not work. Only change here was V20.
    Cant get to Aux on the same flat network.


    Beyond that all I can do is say I tested it 50 times before this last deployment and it just simply doesnt work right.

  • To rephrase this: The management port, you are using to administrate the firewalls, is not getting into the WAN anymore, but all other VLANs do work fine? Because i tried it right now, no problem with the admin ports. 

    Your screenshots just pop up one question, as it is not the same issue as you described above - Who is the .1? Because your client has a gateway as .1, firewall has .2 and .3. 

    Timeouts means, the HA is not replying to the ARP. Now is the question, does the ARP even reach the firewall or not. You can prove this easily here: Go to the Primary, do SSH from the primary to the Aux using the HA link. 

    There you perform a tcpdump -ni PortX arp and then try to reach it again.

    If you do not see any kind of ARP for the .2, something in between is blocking or consuming it. 

    Again: I think we discussing two different scenarios here. 

    __________________________________________________________________________________________________________________

  • yes this is no revelation its broken in multiple ways. I am only pointing out that it also bombs getting to the aux interface on firewalls that were working on 19 and have had no other change except going to 20 and now you cannot get to the aux MGMT interface. 
    .1 is the gateway for the wired network this is a very simple example that shows aux connection not working like it should even with environment up and working. That why I said "we have other examples"
    what you are asking dosnt make sense YOU CANNOT GET TO AUX INTERFACE IN THE SCREENSHOT, HTTPS>, SSH, PING and so on. IT does not respond.

    The rest, I dont know how else to explain I have said the same thing several times in as much detail as I can, its just not getting through.

    Client behind firewall can ping out to internet until there is a failure. Once that happens cannot ping out and packets get stuck landing on MGMT interface instead of going out the .1 gateway. 

  • I am sorry, but you can. Login via SSH to the Primary. Which works fine in your example. Then on the primary, you can use ssh admin@HAClusterIP to login to the second appliance.

    From there, we can discuss and investigate, why we cannot reach the AUX. Because there we could try to find the reason in tcpdump, why your network is not able to access.


    Another thought here: Maybe it is easier, if you could through a network flow (even in paint) where it fails and how you could find the management port be involved. Pictures make it easier to understand. 

    Looking into your network flow: From what i understand, you have the .1 (a router). You are in this network, the firewall is .2 and .3 
    So why are you going from the client with IP .100 to the firewall, then to .1 in the same network? How is the firewall even a part in your setup, if your gateway is the network? 

    __________________________________________________________________________________________________________________

Reply
  • I am sorry, but you can. Login via SSH to the Primary. Which works fine in your example. Then on the primary, you can use ssh admin@HAClusterIP to login to the second appliance.

    From there, we can discuss and investigate, why we cannot reach the AUX. Because there we could try to find the reason in tcpdump, why your network is not able to access.


    Another thought here: Maybe it is easier, if you could through a network flow (even in paint) where it fails and how you could find the management port be involved. Pictures make it easier to understand. 

    Looking into your network flow: From what i understand, you have the .1 (a router). You are in this network, the firewall is .2 and .3 
    So why are you going from the client with IP .100 to the firewall, then to .1 in the same network? How is the firewall even a part in your setup, if your gateway is the network? 

    __________________________________________________________________________________________________________________

Children
No Data