This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XFRM Interface flapping after HA failover

Hi all,

today I made an manual failover to the auxiliary device. On the auxiliary device the XFRM interfaces began to flapping. On both tunnel ends I had many interface up and down events (ervery few seconds). The IPSec Tunnel itself seems to be stable (WebAdmin shows a green status). Both firewalls shown the tunnel as up. OSPF shows no neighbors available. 

After I switched back to first device, the XFRM interfaces become stable and most tunnels are back online, some tunnels needed manually restarted to work again.

The HQ firewall is an XGS5500 with SFOS 19.0.1. Most site firewalls runs also on 19.0.1. We have also some firewalls witch runs on SFOS 19.5, these boxes had also the flapping XFRM interfaces. 

 anybody an idea what this behavior causes?

Ben



This thread was automatically locked due to age.
Parents
  • hi Ben, 
                XFRM interface flaps only if the corresponding IPsec tunnel is flapping.  
    Does log viewer (filter on VPN) indicate any VPN tunnel flaps during the issue time?. 
    How many IPsec tunnels are active on the Node. 

    Also in 19.5 GA there are some IPsec scaling fixes that could be relevant.

    https://docs.sophos.com/releasenotes/index.html?productGroupID=nsg&productID=xg&versionID=19.5

    NC-106608 IPsec Duplicate SAs created
    NC-94603 IPsec VPN tunnels flapping continously


    Regards,

    Vamshi

     



     

  • Hi Vamshi,

    while the firewall runs on the 2nd node, I had multiple interface Down and Up events (Message ID 17813) in the system log but no IPSec Terminated (ID 17802) or Established (ID 17801) messages in the VPN log. So, the tunnel itself was stable.

    OSPF had starts to work, when I has to switched to the first node. Some tunnels needed to stopped and restarted before OSPF saws the neighbors.

    On the XGS5500 are 58 IPSec tunnels terminated.

    Ben

    If a post solves your question please use the 'Verify Answer' button.

  • My question was about switches "in front" which meant on he WAN side.

    We had some scenarios where namely cisco switches caused some troubles after HA failover.

    Mit freundlichem Gruß, best regards from Germany,

    Philipp Rusch

    New Vision GmbH, Germany
    Sophos Silver-Partner

    If a post solves your question please use the 'Verify Answer' button.

  • Yes, indeed we have Cisco Switches on the HA link and in front of the Firewall. On the HA ports we disabled strom-control and bpdu guard, which helped a little bit. The update to SFOS 19.5 solved the problem totally. 

    So thank you guys for your hints.

    Ben

    If a post solves your question please use the 'Verify Answer' button.

  • Hi Ben, good to know the update to SFOS 19.5 solved the problem. 

    Thanks for the access-id details. Some additional observations based on the Logs .   There are some IKE SA collisions as the IKE and ESP rekeying appears to be triggered simultaneously from the peer node. This is due to the Phase-1 and Phase-2 Lifetime values being configured the same on the peer(Initiator0 and Responder Nodes. 

    XGS5500_CI02_SFOS 19.0.1 MR-1-Build365# grep collision /log/charon.log | wc -l

    456



    The IKE collisions also cause duplicate SAs and the number of SAs increases over time and other issues. 




    A suggestion would be to clone or create a similar IPsec Policy/Profile (IKEv2_RSP), but with the increased phase-1 and phase-2 Key lifetime values say by 1/2 hour over the Peer(Initiator Node) IPsec Policy/Profile and use the new IPsec Policy in the IPsec connections.

    https://community.sophos.com/sophos-xg-firewall/f/recommended-reads/122440/best-practice-for-site-to-site-policy-based-ipsec-vpn#mcetoc_1f5rpj2kd8




    Regards,
    Vamshi

Reply
  • Hi Ben, good to know the update to SFOS 19.5 solved the problem. 

    Thanks for the access-id details. Some additional observations based on the Logs .   There are some IKE SA collisions as the IKE and ESP rekeying appears to be triggered simultaneously from the peer node. This is due to the Phase-1 and Phase-2 Lifetime values being configured the same on the peer(Initiator0 and Responder Nodes. 

    XGS5500_CI02_SFOS 19.0.1 MR-1-Build365# grep collision /log/charon.log | wc -l

    456



    The IKE collisions also cause duplicate SAs and the number of SAs increases over time and other issues. 




    A suggestion would be to clone or create a similar IPsec Policy/Profile (IKEv2_RSP), but with the increased phase-1 and phase-2 Key lifetime values say by 1/2 hour over the Peer(Initiator Node) IPsec Policy/Profile and use the new IPsec Policy in the IPsec connections.

    https://community.sophos.com/sophos-xg-firewall/f/recommended-reads/122440/best-practice-for-site-to-site-policy-based-ipsec-vpn#mcetoc_1f5rpj2kd8




    Regards,
    Vamshi

Children
No Data