This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

IPsec Child SAs between Google Cloud and Sophos UTM-9 (IKEv1) - dropping

I have a number of IPsec tunnels between our various Sophos UTM appliances and Google Cloud VPCs. They all seem to experience issues from time to time that causes a "Renegotiation Failure" on the Google Side but everything looks great (green) on all Sophos UTM SAs. I continually have to login to the Sophos and disable/reenable the Tunnel to reestablish the SAs. Just prior to the renegotiation failure I see the following log entries in StackDriver related to that tunnel ...

D creating rekey job for CHILD_SA ESP/0x14fc06b6/REDACTED-GCP-GW D handling HA CHILD_SA vpn_REDACTED-SOPHOS-GW{1868} REDACTED-GCP-SUBNET === REDACTED-SOPHOS-SUBNET (segment in: 1, out: 1) I CHILD_SA vpn_REDACTED-SOPHOS-GW{1868} established with SPIs 604104f6_i 9a27541a_o and TS REDACTED-GCP-SUBNET === REDACTED-SOPHOS-SUBNET
Times three (one for each Sophos Subnet)

D received DELETE for ESP CHILD_SA with SPI 537e166a I closing CHILD_SA vpn_REDACTED-SOPHOS-GW{1865} with SPIs 14fc06b6_i (0 bytes) 537e166a_o (0 bytes) and TS REDACTED-GCP-SUBNET === REDACTED-SOPHOS-SUBNET

Times three (one for each Sophos Subnet)


So - it appears that there was a REKEY request and then the DELETE request but once that last round of DELETE requests completed the tunnel was down (just on the GCP side - all is green on the Sophos side). I had initially looked closely at the lifetimes in the Google Cloud Policy and adjusted to what has been suggested by Google (see attached). Specifically IKA SA Lifetime should be 36600 and IPsec Lifetime should be 10800.

 

The remote Gateway:

 

And finally the Tunnel:

On the Google Cloud Side I simply created a "route-based" IKEv1 tunnel and added one subnet on the GCP side connected to 3 subnets on the Sophos side. I did it this way because Sophos UTM-9 does NOT support IKEv2 (assuming so - never found a definitive answer on that subject) and Google Cloud only allows a single remote subnet for Policy-based IKEv1 tunnels.

Is anyone else in this forum seeing anything similar or is there a better way to configure these tunnels so that they are more reliable. I never have any issues with Sophos to Sophos IPsec tunnels nor any of my AWS tunnels. Seems only to be an issue with GCP.



This thread was automatically locked due to age.
Parents
  • In addition ...

    I just disabled DPD as I was seeing messages like this on my Sophos ...

    2019:05:06-14:01:56 jax-office pluto[6376]: “S_REF_IpsSitGcpSsTunne_2” #6567: DPD: Received old or duplicate R_U_THERE
    2019:05:06-14:05:28 jax-office pluto[6376]: “S_REF_IpsSitGcpSsTunne_2” #6619: cannot respond to IPsec SA request because no connection is known for 0.0.0.0/0===REDACTED-SOPHOS-GW[REDACTED-SOPHOS-GW]...REDACTED-GCP-GW[REDACTED-GCP-GW]===REDACTED-GCP-SUBNET
    2019:05:06-14:05:28 jax-office pluto[6376]: “S_REF_IpsSitGcpSsTunne_2" #6619: sending encrypted notification INVALID_ID_INFORMATION to REDACTED-GCP-GW:500

    Thinking that 0.0.0.0/0===REDACTED-SOPHOS-GW is a bit strange but may be the default for "route-based" VPN Tunnels

  • Kip, is there any regularity to when these spontaneous failures occur?

    I'm not familiar with the GCP offering - does in not require DPD?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hey Bob,

    I have not been able to nail down any regularity but it does not drop when the IKE or IPsec lifetimes expire. There isn't much configuration on the GCP side of the Gateway/Tunnel. We just specify IKEv1 or IKEv2, route or policy based, remote gateway and subnets, and local subnets. Once created there is no editing or changing anything on their side and most of the details are hidden. I could likely pull more details using the gcloud command interface but have not delved too deeply with that. I can see StackDriver logs and monitor tunnel and gateway stats. 

    I have had most issues with the two GCP tunnels I have connected back to our Corp Sophos (via AT&T Fiber). Being route based I have to be cautious that there are no conflicts on the subnets. I'm not sure about if DPD is required. It seems like GCP will negotiate just about everything because I don't setup policies there. It just takes (or doesn't take) what I provide on the Sophos side so I do Initiate the connection from the Sophos.

    Per Google: DPD (Dead Peer Detection) Recommended: Aggressive. DPD detects when the Cloud VPN restarts and routes traffic using alternate tunnels (see https://cloud.google.com/vpn/docs/how-to/configuring-on-premises-gateway). That page does not discuss the lifetimes required for IKE and IPsec, though.

    I'm not using redundant tunnels nor BGP for this and currently have DPD disabled on this Corp Sophos for the moment. Will continue to monitor the logs as my AWS VPN does use BGP and could failover to a different gateway. I have not seen any drops or renegotiation failures since I posted this.

Reply
  • Hey Bob,

    I have not been able to nail down any regularity but it does not drop when the IKE or IPsec lifetimes expire. There isn't much configuration on the GCP side of the Gateway/Tunnel. We just specify IKEv1 or IKEv2, route or policy based, remote gateway and subnets, and local subnets. Once created there is no editing or changing anything on their side and most of the details are hidden. I could likely pull more details using the gcloud command interface but have not delved too deeply with that. I can see StackDriver logs and monitor tunnel and gateway stats. 

    I have had most issues with the two GCP tunnels I have connected back to our Corp Sophos (via AT&T Fiber). Being route based I have to be cautious that there are no conflicts on the subnets. I'm not sure about if DPD is required. It seems like GCP will negotiate just about everything because I don't setup policies there. It just takes (or doesn't take) what I provide on the Sophos side so I do Initiate the connection from the Sophos.

    Per Google: DPD (Dead Peer Detection) Recommended: Aggressive. DPD detects when the Cloud VPN restarts and routes traffic using alternate tunnels (see https://cloud.google.com/vpn/docs/how-to/configuring-on-premises-gateway). That page does not discuss the lifetimes required for IKE and IPsec, though.

    I'm not using redundant tunnels nor BGP for this and currently have DPD disabled on this Corp Sophos for the moment. Will continue to monitor the logs as my AWS VPN does use BGP and could failover to a different gateway. I have not seen any drops or renegotiation failures since I posted this.

Children
  • Kip, are you having any better luck on a Google forum for GCP?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hey  

    I have not been able to get much traction in GCP Forum as I am just a little fish in a large ocean out there. I keep being told to buy a real firewall and quit using Sophos as they are quick to point fingers at the Sophos. I still experience tunnel drops from time to time but with no regularity. I did some more research regarding the Sophos and "route-based" IPsec tunnels (aka: 0.0.0.0/0) and keep reading that it's not supported so that could be the source of this issue. I have primarily used route-based definitions on the GCP side because that is the quickest way to create tunnels with multiple SAs (subnets on the Sophos Side). They create quickly and come up quickly and the routing tables on both sides route the subnets correctly through the proper tunnels interfaces. I never see any issues while the tunnels come up in the logs on either side but at some point if there is serious disruption of traffic and the tunnels must renegotiate then I think that's where I am running into issues. For example I might see ...

    ... no connection is known for 0.0.0.0/0===209.34.xxx.xx ... (that's my Sophos WAN redacted)

    on the Sophos side which means to me it can't renegotiate the SA's. Funny that it shows green on the Sophos side but I see a RENEGOTIATION ERROR on the GCP side. However, just turning off the tunnel on the Sophos, waiting a few seconds, and turning it back on brings everything back up and it stays up for weeks. I will say that there was one occurrence where I lost all of my GCP tunnels and could not restore them just by disabling and re-enabling the tunnel. I had to rebuild every one from scratch on the GCP side with new PSKs to get them up again so I'm not convinced the Sophos is really the culprit here.

    Knowing now that the Sophos is a Policy-based IPsec tunnel I have tried just building the tunnel as policy-based instead of route-based on the GCP side but that only allows a single subnet for IKEv1 at GCP (seems we won't get IKEv2 on the UTM). I have hacked that a bit by manually adding routes on the GCP side through the GCP gateway to the Sophos Gateway but not really sure that's supported by Google (certainly not documented). It seems to work but in the end I'm only negotiating a single SA (perhaps the additional subnets become "child SA's"?). I'll have to test that concept again under load and see if I get more stability.

    Another option I suppose would be to create multiple tunnels (one for each subnet) back to the Sophos but that just seems ugly to me. I cannot have multiple VPN Gateways (one for each subnet) on the Sophos UTM because I can only use the default WAN.

    GCP is now depreciating this single Gateway (they call it legacy) VPN for a redundant, multi-gateway, BGP route-based configuration but I have not endeavored into that just yet. The Sophos makes it easy to do that for AWS with the AWS easy button and the Sophos configuration download at AWS but I'm not ready to dive into doing my own BGP at the moment.