Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SOPHOS Purposefully Designs bugs into their Firewalls: Episode 2 – Email Alerts, Green Statuses, and Routes

I’m documenting my numerous issues with SOPHOS Firewalls so that others can be aware of what they are getting themselves into.

Episode #1

community.sophos.com/.../sophos-purposefully-designs-bugs-into-their-firewalls-episode-1---vpn-failover-and-wan-interfaces

 

Issue # 2 – Email Alerts, Green Statuses, and Routes

               

As an administrator, it’s impossible to check every system under our management multiple times a day. So it is very commonplace that systems have alerts that will let you know when something is amiss. Under the SG Firewall, the alerts were very robust, not so for XG.

  • For one example I was alerted when anyone signed into a firewall. Under the new XG Firewall, this is not an option.
  • On SG if an AP went offline for some reason the alert noted the name of the AP(if you named is) so you’d know right away which AP was offline. On XG, you can name the APs as well, however it only lists the serial number of the AP that went offline. So you now need to go check which AP has that serial number before you can go track it down.
  • Same goes for HA Appliances. You have a notification if one goes down and the other becomes primary, and instead of including the name of the device, it tells you (node1) and the serial number. So now you need to go track it down. It includes a whole host of other information you don’t need, and excludes the information you do need. It seems like Joe from shipping\receiving is the one that makes the design choices. And the sad part is they OWN a well-designed product they could steal good ideas from while they design the new OS.

 

Secondly…

                I had a strange routing issue. We have IPSEC VPN Tunnels, and each tunnel has 6 routes. If you go into the VPN connection details, there is a button you can click on and it will show you the routes and a green light beside each, indicating their status. Green means good, Red means bad.

 

                When this issue happened, one of the 6 routes was not working. This VPN had been functioning for 3 months flawlessly and then in the middle of the day, one route stopped working. I proved the behaviour it by confirming that our domain controllers could not be reached(which was also the complaint of staff). I checked the VPN route statuses and they were all green, including the route that was not working. I contacted SOPHOS immediately as I’ve had all sorts of strange issues happen with these firewalls and now I had a live case for them to see.

 

                The tech I spoke with confirmed that the firewall showed all was good (green statuses everywhere), and also confirmed that the route was definitely not working. I knew if I bounced the VPN tunnel the issue would go away, but I didn’t want to touch it as I wanted SOPHOS to see and diagnose the issue.

 

                The first thing the tech wanted to do was see the VPN config, however, when you have your VPNs configured in a failover, you have no way of seeing the VPN configs anymore. Joe from shipping\receiving(who is the Designer for these Firewalls), must have figured it wouldn’t be necessary. I’ve run into this issue multiple times already in 4 months, when I’ve called SOPHOS for support. SOPHOS Techs support want to double check settings and literally can’t without taking our VPN offline. I checked with SOPHOS design people on this, and they assured me it was “by design” and “working as intended”. SOPHOS tech support did not agree.

 

                Next the SOPHOS tech decided to open up a packet capture on the firewall. The second he enabled the packet capture, it caused the routing issue to start working again. Very strange.

 

                After that he grabbed all the logs, however, he was unable to determine the issue because the logs were not in debug mode. So I asked him to put all the logs in debug mode and he said that the firewall would cease to function if he did that. So unless the problem is repeated and recurring, you can’t diagnose it because the logs don’t capture the necessary data in non-debug mode. I’ve had this happen on multiple calls with SOPHOS, where lack of debug mode means “problem not solved, case closed”. I’ve also never experienced this issue with logs being insufficient with the SG Firewalls. Somehow, that logging could capture the necessary info, where XG logging cannot. I’m sure this is “as-designed” too.

 

                So I’m working on my clairvoyance degree now, so that I can ensure we enable debug mode before problems happen. This way we’ll hopefully be able to troubleshoot issues.



This thread was automatically locked due to age.
Parents
  • Yes, It’s frustrating - every time “something strange” happens in network it’s probably in 8 of 10 cases because an hard to troubleshoot/nail down issue with sfos. Opening support-cases for root-cause-analysis would take too much of your time, multiple downtimes to diagnose, and so on… Working for a customer - who’s gonna pay for this time? So in most cases single downtime and reboot “solves” most issues. That’s not what to expect by an enterprise product this expensive. Even HA is not “HA” as on SG/UTM as it takes usually more than 1-2 ping-drops and WebAdmin will be available after a few minutes after HA-Failover - not immediately.

    SFOS has some good new features, like Central Connection, SD-WAN and more - but still not the reliability SG had, wich should be key-feature. Sophos should focus on logging/stability first, before looking for new features, marketing suggests.

Reply
  • Yes, It’s frustrating - every time “something strange” happens in network it’s probably in 8 of 10 cases because an hard to troubleshoot/nail down issue with sfos. Opening support-cases for root-cause-analysis would take too much of your time, multiple downtimes to diagnose, and so on… Working for a customer - who’s gonna pay for this time? So in most cases single downtime and reboot “solves” most issues. That’s not what to expect by an enterprise product this expensive. Even HA is not “HA” as on SG/UTM as it takes usually more than 1-2 ping-drops and WebAdmin will be available after a few minutes after HA-Failover - not immediately.

    SFOS has some good new features, like Central Connection, SD-WAN and more - but still not the reliability SG had, wich should be key-feature. Sophos should focus on logging/stability first, before looking for new features, marketing suggests.

Children
  • After reading your 2 posts here, I thought about my last 3 years on the SFOS track with multiple devices, while still administering two UTM clusters.

    If your're used to work with UTM for years and switch over to XG, it's a hard and long way. yes.

    UTM is like a huge workshop, well equipped with loads of robust, old fashined tools. You can do solid work. And I agree: the way you want to.

    With XG/S I feel like havig a small box with some screwdrivers all of the same colour and shape (you don't find what your searching for). But next to the box you will notice some of the latest electronic diagnose devices, unfortunately you'll only use them rarely.

    The bad alert mails of XG have always been a pain - I just ignore most of them, because they are useless - no helpful information contained. You have no choice and need to check directly on the firewall. I feel, I need to mention SFOS has no mail throttling capability. One day morning you will find your mailbox flooded with 10k similar mails. UTM mails are different - I read them - because they do help: they contain the information you need to decide if you need to react or if it can wait. Not to talk of the search capabilities of UTM - SFOS now has some search function but it is so basic, even if I was excited about the new search, I use it almost never.. HA failover taking 5-10 minutes - always... OK, I learned and got used to it.

    I'm not sure if all our writing in threads like this will change something. The community is really a great source of help and the guys around Luca and Emmanuel (only to name them because actively writing in this these threads) are doing what they can to help you with cases and know how. They link to guys in the background and so on. That is REALLY great and I like my time reading and writing in the various communities.

    I just whish Sophos Dev would some day equip the XG with the cool tools for every day use that they have in the UTM and that make admins happy. But it seems the old Cyberoam code framework makes this impossible. I think, in terms of security, XG/SFOS is levels above UTM and that compares some lacking features. I don't like your words, stating that Sophos design bugs into their code, I think they are doing a hard time to do big changes due to too many small issues and limitations. I'm always shocked how much higher the number of a new support case ID is, two weeks after the previous support case.

    Still looking forward: things have evolved and improved with Sophos and most of their products. I hope some day after a new Firmware release, I may find myself thinking: wow, that firewall is now great and forget about the old UTM.