Uplink balancing vs Active Standby, dual ISP questions

I've got a large number of UTM devices at sites with dual ISPs and we're trying to resolve a 'best practices' question.

We typically have both ISPs active with multipathing / weights set up to put our 'priority' traffic (VOIP and RED Tunnels) on the better ISP, and everything else on the secondary.  This works great until the primary fails, at which point the tunnels fail over to the secondary.  That's not a problem, except that when the primary comes back up, the tunnels never fail back to the primary interface on their own. They can sit on the secondary (weaker) connection for hours, days, or weeks until we manually deactivate and reactivate them.

We're considering going to an Active / Standby setup with dual ISPs to address this issue, however in that configuration, our PRTG service can't properly monitor the backup connection (since it's essentially off).

For those of you on dual ISP setups:

1) How do you make sure RED tunnels (or whatever tunnels) fail back to a primary interface when an outage is resolved?

2) If you're running Active / Standby instead of multipathing, how do you monitor your standby ISP?

 

Thanks for the guidance.

  • I think you want to stay with Active-Active and your Multipath rules.  See the second exception in #3 in Rulz (last updated 2019-04-17).  Any better luck now?

    Cheers - Bob

  • In reply to BAlfson:

    This doesn't do exactly what we want - the problem is the persistence of the tunnels on the 'lesser' interface after the primary comes back up.  So say we have two connections, Fiber and Cable.

     

    We use multipathing to set the tunnels to go out Fiber (by specifying all traffic to the RED destination use the Fiber interface).  This works fine.

    Fiber goes down, the tunnel fails over to Cable.  This works fine.

    Fiber comes back up, but due to the persistence of the connection, the tunnel stays on Cable for days, weeks, or months, until either Cable goes down, we restart the RED interface, or we restart the entire device.

    There has to be a way to force the tunnels to re-initialize once a day or something, right? 

  • In reply to TG1:

    Click on the wrench beside 'Active Interfaces' and show us a picture of those settings.  Also, show picture(s) of the Edits of the relevant Multipath rule(s).

    Cheers - Bob

  • In reply to BAlfson:

    This shows the uplink balancing (we use our primary for ONLY multipath-specific traffic, everything else goes out eth2)

     

     

    This shows the multipath rule that forces tunnels onto the primary (the group shown, Colos, includes the IP of our RED tunnel endpoint).

     

    Again, this piece is working, it's the failing-back-over that doesn't.

  • In reply to TG1:

    Add a fourth Multipath rule at the bottom binding 'Any -> Any -> Any' to 'eth2 - monkeybrains'.

    For testing purposes, in 'Edit scheduler', set 'Persistence timeout' to 1 minute.  After testing, set it back to 15 minutes.

    Any better luck with that?

    Cheers - Bob

  • In reply to BAlfson:

    I'll make the change and test it, but can you explain to me how this is supposed to affect the change we want? If the issue is connection persistence, and the tunnel doesn't reinitialize unless it's downed and brought back up or otherwise interrupted, how does adding this at the base change the current setup?

     

    Thanks for the info.