After updating from SFOS 18.5.3 MR-3-Build408 to SFOS 19.0.0 GA-Build317 I started getting complaints of services not working, they depend either on outbound firewall rules or inbound DNAT rules.
The first failure to be reported was VoIP, oddly enough running a VoIP client from my own machine would work just fine, but a specific VoIP gateway device which had it's own rule was not working at all after the upgrade.
Another example is an internal web server that only accepts connections from specific FQDN, also stopped being reachable from the outside after the upgrade.
I have seen people reporting other types of issues but none similar to this. And yes, regressing to SFOS 18.5.3 MR-3 fixed every issue both times that I tried the upgrade.
A couple concrete examples below of simple rules that stopped working after v19.0GA upgrade:
- Source zones: LAN
- Source network and devices: (IP for local VoIP gateway)
- Destination zones: WAN
- Destination networks: Any
- Services: SIP (UDP/1:65535 - UDP/5060)
Another FW rule, created with DNAT wizard:
- Source zones: WAN
- Source network and devices: FQDN (mydomain.com)
- Destination zones: LAN
- Destination networks: #Port2 (Public IP)
- Services: HTTPS
Resulting NAT rule from above:
- Original source: FQDN (mydomain.com)
- Original destination: #Port2 (Public IP)
- Original services: HTTPS
- SNAT: Original
- DNAT: (IP for LAN web server)
- PAT: Original
- Inbound interface: Port2 (Public IP)
- Outbound interface: Any
Thanks for the suggestion.
I do not have PPPoE on Port2, no, standard IP WAN.
I don't really know if there is any point in trying to troubleshoot this, simply because if I manually recreate the exact same…
Thank you for contacting the Sophos Community and the feedback.
Do you happen to have PPPoE on your Port2 WAN interface?
If you decide to upgrade again and if the problem repeats with any NAT/Firewall rule, try to do a GUI PCAP to see what Firewall Rule and what NAT is being or not being used.
Also try doing a conntrack for the IP with the issue, for example 192.168.5.100
You confirm firs the traffic is using the correct Port with a tcpdump
# tcpdump -eni any host 192.168.5.100 and port 5060
Then if you see the traffic do the coontrack
# conntrack -E -s 192.168.5.100 | grep "5060"
See if the traffic is hitting the Firewall Rule and the NAT ID rule (fwid) (natid)
If you see the traffic hitting the Firewall rule but not the NAT, try deleting the conntrack for this connection
# conntrack -D -d 192.168.5.100
And initiate the traffic or wait for new connections from that IP and port to come again to the Firewall.
I don't really know if there is any point in trying to troubleshoot this, simply because if I manually recreate the exact same rules I already have it starts working. I then begin to tinker back and forth, toggling between both rules with the exact same parameters, and eventually the migrated rule starts working again. I then delete the new rule and the old/migrated rules stays in place and working. Next I disable the rule and re-enable seconds later, that NAT rule won't work anymore - yet again. This goes on for all the rules. Nothing works, and when it does its short lived.
A firmware upgrade shouldn't need to force me to recreate all my DNAT rules when the ones in place are 100% the same, just migrated from v18.5 to v19. And regardless, all rules randomly start working and stop working just by fidgeting with them (while leaving them setup as they were originally). Some times even just by applying changes to one rule somehow screws an unrelated one.
On top of this, even the remote access services, which I rarely use (just now, essentially) is also having major issues. I have enabled WAN HTTPS, WAN SSH, WAN Ping, WAN SSL VPN (this was the only one enabled in previous FW), WAN User Portal... none of them are reachable!
I have now upgraded the firmware and back about 4 times, one of those times I was doing so remotely, via SSL VPN. After I upgraded, FW rules didn't work at all but at least the SSL VPN did... but after a simple reboot I was locked out and the SSL VPN service wasn't reachable anymore. I was lucky that I had an HTTPS tunnel to a NAS inside the premises, I was able to run an ssh client inside a Docker image, ssh into the FW, revert the firmware with the loadfw command, and rebooted back to v18.5. I was then able to reconnect to the SSL VPN service.
I really don't want to deal with such huge issues. I'll stick with 18.5 since it has been fairly stable for me, extremely so in comparison. I might give it a go in the next version but I must admit that I lost faith in Sophos' QA when it comes to fw releases.
Sorry for the rant, I'm not going to alpha test this crap though.
Thank you for the feedback.
Hi Emmanuel! I had the same issue, unfortunately I downgraded from the firmware SFOS 19.0.0 GA-BUILD317 to SFOS 18.5.1 MR-1-Build326. Could you please verify if there is something wrong with that version? I deployed two virtual firewalls with the same configuration and different firmware version and it was not possible to browse with the new version 18.104.22.168.
I really appreciate your help.
Have a nice day!