IPSec connections go down very slowly

Is there something other than the gateway monitoring that takes a tunnel down? 

I ask because my device (SFOS 18.5.3 MR-3-Build408) has two tunnels, a primary and a backup, to another XG(SFOS 18.0.5 MR-5-Build586).

Connection type: Tunnel Mode

XFRMs: configured as a /30 

Gateway Monitoring: pinging the other end of the tunnel's XFRM(the other /30) 

Heres my health check settings:

Problem is when I pull my WAN1 cable, the WAN link monitoring takes it down and the Gateway monitoring registers the 10.210.153.197 as unpingable on the other side of the tunnel BUT the IPsec connection keeps that tunnels "Connected" state green for nearly 2 minutes and 40 seconds before it finally takes it down. That's way to long to wait for tunnel traffic to resume. 

Does anyone have any ideas for this issue to bring the tunnel down faster? 



Edited TAGs
[edited by: emmosophos at 5:47 PM (GMT -7) on 23 May 2022]
Parents Reply Children
  • I did that. It should switch to the other tunnel as soon as the primary gateway goes down which is about 5-10 seconds but it actually will not until the tunnel itself turns red which takes about 2:40.

  • In v19, using the quality-based SD-WAN, it should switch based on pings/lag so it should not care about a tunnel being up or down or gateways at all. If you have the chance to try v19, it might not have the issue and it might allow SD-WAN to reroute when the pings stop rather than when something else causes something to go down.

    (I have to admit that I'm confused by the whole tunnel thing. The only tunnel I've used is a 6in4 tunnel and there's no gateway associated with it. I can make one, but it doesn't make any difference to the traffic or anything else as far as I can tell. Then again, that kind of tunnel isn't "up" or "down".)

  • Upgraded to V19. Connected my tunnels and pulled WAN1. My gateways(WAN1 and Tunnel) went down but my tunnel stayed Green. This is a real problem. 

  • You can configure when the gateway should turn RED. This has nothing to do with the XFRM interface. 

    So basically if this check will result in a RED state, SD-WAN will not use this interface anymore. 

    You change this check in the profile to some host behind your tunnel (or the XFRM Interface IP of the other end). 

    __________________________________________________________________________________________________________________

  • I configured my SDWAN profile just as you did above. Here is what happened:

    1. Constant ping to LAN address of the COLO Sophos LAN IP started and pinging normally. 

    2. Pulled my WAN1. 

    3. Pings stop. 

    4. SDWAN policy group shows Tun1 gateway is Red and Secondary gateway is Green with checkmark(in use)

    5. Tunnels remain green like so:

    6.Try to manually bring down Tun1 by clicking the "Connection" green light - get long wait spinner in the middle of the screen and then an error

    7. Try to click the "Connection" button of Tun1 again - get long wait spinner in the middle of the screen and thenthe same error about cant connect to the interface 

    8. Try clicking "Active" to deactivate completely - get long wait spinner in the middle of the screen and then then a yellow box says this will take some time to complete and you can check the log file. 

    9. Finally the tunnel goes to red/red and I can resume pinging the COLO Sophos LAN IP.

    This is a bug. It's keeping the tunnel up and the traffic is not correctly tied to gateway state but rather the "Active/Connection" state of the tunnels being green/green, green/red. or red/red. Whatever module in the os is determining that state seems to be controlling traffic flow, not the gateway monitor. 

    The following screenshots were taken simultaneously:

    1. The gateway is down still cant ping across the tunnel even though it should be using the secondary tunnel for ICMP traffic

    2. 

    Tunnels stay green for approx 2:40 seconds

    3.

    Tunnel connection finally turns red and instantly pings resume. 

     

  • Ignore the Tunnel. 

    Use the SD-WAN Policy. This should be way faster RED. How long does the SD-WAN Policy need to switch? 

    BTW: Ping is actually hard to check for SD-WAN, as it is stateless. Try using a TCP/UDP Protocol. 

    BTW: You can configure the DPD (Dead peer Detection) within the IPsec Profile. 

    __________________________________________________________________________________________________________________

  • The SDWAN Policy switches within seconds. Id say 3 seconds. 

    Also my DPD is 10/25/reinitiate. 

    I've got a plan for the TCP/UDP test ill report back after the test. 

  • This does puzzle me a bit. What is the actual distinction between a Gateway and a Tunnel?

    I've only used a 6in4 tunnel to get IPv6, and in that case, it's a target for my IPv6 default route. To me, it doesn't really exist as a gateway in the same sense as the gateway to my ISP. (Over which, of course, my 6in4 tunnel flows, so the ISP gateway is foundational while the 6in4 tunnel depends on it.) And I do not need a gateway to route traffic to my 6in4 tunnel -- it all works fine without a gateway for the tunnel. And in fact, the gateway isn't an interface which means it's not selectable in many places to actually do anything with it.

    I can make a "gateway" that pings the other end of the 6in4 tunnel (via IPv4, not through the tunnel itself, I believe) but it doesn't seem to have a real function. Then again, IPv4 and IPv6 don't really fall into the whole "gateway fails, failover" or "SD-WAN profile slow, reroute" paradigm so maybe things only get cloudy in my 6in4 world.

    It feels like the issue the OP is seeing may be related to the relationship between tunnels and gateways, hence my suggestion to use a SD-WAN quality policy to deal with routing in the face of poor performance and to skip the whole up/down gateway paradigm. But the more I think about it, the more confused I get. Should I have a gateway for a 6in4 tunnel? For other kinds of tunnels? Or only for independent physical routes to the internet?

  • There is no real relationship between Tunnel and Gateway. Because gateway is something virtual, while the tunnel is a "physical" component. 

    A Gateway can be a Router within your network, it can be your ISP network, it can be everything. It is something you create on the firewall. 

    The Tunnel is the actual "Link" between two peers. It is the object, the firewall is using to get to some place. 

    SD-WAN rules works with the gateway. If the gateway detection notice a dead peer, it will move to the next gateway. This is in no relationship with the underlaying medium. If the Tunnel or the "cable" to the gateway peer is still there, SD-WAN does not care. It will take the connection down. 

    The only downside is, and thats the problem with ICMP: If you have a active connection, the application will notice the Tunnel is actually dead and build up a new connection. The new connection will use the new Link. That is how TCP/UDP will work. But ICMP will still pump ICMP requests through the dead link because the connection is still there. 

    Therefore ICMP is not the best way to test this scenario, as it will indicate this "it takes for ever to switch", which is actually not correct. It will likely be refreshed by the packets and the underlaying tunnel interface will catch it and still send it. 

    But a TCP/UDP application will likely catch up this rather quickly and rebuild the connection. 

    __________________________________________________________________________________________________________________