I'm attempting to find a way for us to detect and shut down IPSEC tunneled interfaces fast for fast route recovery. I've configured 2 18.5.3 mr3 firewalls in eve-ng and built 4 tunnels between (2 WANs on each).
The IPSEC tunnels are RSA tunnel interface style, with IKEv2 modified for DPD with 10 second hellos and 25 second hold timer. I have the IKEv2 stated to disconnect on loss, which I would assume would happen in the 25-35 second timeframe - however, the tunnels take 160-180 seconds to drop after dropping the internet path from one of the 4 WAN interfaces.
Is this common - or am I hitting a bug in 18.5.3 mr3?
IPsec needs some time. But you can use SD-WAN Rules to do this in real time if you want.
I've set up SDWAN routes, and failover is definitely faster (<15 seconds when defaults are modified) - however, I'm seeing sessions stick on the IPSEC tunnel interfaces. For instance: I have 2 IPSEC tunnels on 2 separate paths, with SDWAN routes set up appropriately for failover. When failover occurs, new sessions / ICMP follow the new path. However if I had an SSH session running across tunnel 1 when the failover occurs, the Sophos appears to stick that connection to the tunnel interface of the failed route, causing a severe session hang. Additionally, the firewall never times out / denies that existing session, so it stays open on the initiating machine for quite a long time.
I'm looking for something more router-like to where the sessions don't stick to the interfaces, but follow the routing flows. This might not be available in the Sophos environment, but I'm curious to understand more.
Not as familiar with tunnels as I should be, but is there any DNAT-ish address translations going on (due to IPSEC, tunnels, or other) for the tunnels? That is, would your SSH server at the far end suddenly see your near-end machine's IP address change because of the failover? I'm not sure that sessions can do that. Of course, new sessions or sessionless (UDP, ICMP) connections don't care and just work.
I imagine that the tunnels could make your local machine have exactly the same IP address, in which case a failover should work in theory. But I don't know enough to say one way or the other. Just throwing it out there.
No NAT whatsoever. Pure LAN to VPN to LAN, any-any between both zones allow rule added just to eliminate any "firewalling" issues. No IP changes etc; it just appears that any existing tcp session is stuck egressing a tunnel that I'm not trying to route over in a failure scenario. *EDIT: This was confirmed with a packet capture on the virtual firewall in question; the packets attempted to egress vpn path xfrm1 when xfrm4 should have been the path after failover.