Dead Peer Detection has a hidden design flaw.
Dead Peer Detection is a feature designed to retry\re-establish a tunnel when a tunnel drops. You can set 3 settings in this feature for 1)how long to wait before a retry, 2)how long to wait for a response, 3) what to do if a response is not received.
Ours is set to retry again if a response is not received(indefinitely) until a connection is re-established. However, this wasn't working. In a situation where a site went offline for for a couple hours, the initiator stops retrying.
After much troubleshooting with SOPHOS(months), they determined that the Dead Peer Detection is hard set to try 5 times and give up. This is hard coded and not configurable.
Each attempt of the Dead Peer Detection feature, it has a configurable number "rekeying tries". After which it fails out and moves onto the next DPD attempt.
The solution SOPHOS offered, which I presume will work(however we have not had a real world outage to confirm) is to set the "rekeying tries" to 0(unlimited) so that it keeps trying to re-key the connection and avoids Dead Peer Detection giving up. It took months to get to this "solution" however I would call it a workaround, as the core issue is the Dead Peer Detection has been hard coded to give up after 5 tries, and is not configurable.
This is my constant issue with SOPHOS design. If Dead Peer Detection wasn't going to be configurable as to how many attempts is makes, it should be left at infinite tries. Choosing some arbitrary 5 tries is just another example of the asinine design choices that have been made. Someone thought "5 tries will probably be fine for all customers". I'd love to know who.
This thread was automatically locked due to age.