Only allow certain devices to use backup ISP when primary ISP is down

I have a primary and backup ISP, with the backup ISP being a cellular-based limited bandwidth plan. The purpose of the backup ISP is for my “critical” devices such as my home server which hosts my alarm system via Home Assistant (so I can still receive notifications if I’m away).

I had this setup for a couple years with no issues and documented my setup in this post. Everything has been working as expected until recently where I’ve started noticing  issues which I describe in this post, and now it seems like my backup ISP is always being used when the SD-WAN route rule is enabled even though the primary ISP is online. I’ve tried restarting my home server and restarting Sophos XG with the SD-WAN route rule disabled, then enabling it again, but I’m still having the same issue.

What I’m attempting to achieve: Primary ISP is used by everything on my network when it is online. If the primary ISP is offline, only my critical devices (home server and a few other devices) start using the backup ISP. Once the primary ISP comes back online, the critical devices start using the primary ISP again.

The issue I'm having: Devices with the failover SD-WAN routes are using the backup ISP when the primary ISP is online.

Version: SFOS 20.0.0 GA-Build222

Here’s a screenshot of my current SD-WAN route rule:

I have “Reroute status of live connections” enabled (default is enabled) and “Reroute status of live SNAT (source NAT) connections” enabled (default is disabled). My  precedence for routing is: Static route, SD-WAN route, VPN route.

Edit: Some additional information that might be a factor. There's a known bug with Sophos XG that I reported over two years ago but appears it hasn't been fixed yet. This wasn't an issue in the past but I'm not sure if something maybe changed with the newer version of Sophos XG. Basically my primary ISP gateway status will always show "red" even though it's online and working. This is because Sophos XG plugs directly into an ONT (fiber) devices for my ISP, and uses a VLAN ID of 201. Again, this hasn't been an issue in the past and I believe this is just a GUI bug, because if I take the primary ISP down, I will get a notification that it's down and when it's back online, I get a notification it's back online.

Screenshot of the issue I'm describing:

Screenshot of my primary ISP network interface setup:

Also, a screenshot of the default SNAT IPv4 rule (I have not modified this rule at all):.

Any pointers or tips would be greatly appreciated. Thanks.



Added TAGs
[edited by: Erick Jan at 11:49 PM (GMT -7) on 31 Mar 2024]
  • My  precedence for routing is: Static route, SD-WAN route, VPN route.

    Try to set your precedence for routing as: "SD-WAN Route - Static Route - VPN Route."

    After It enable the "Route only through specified gateways" at the SD-WAN Policy you shown above.

    What I’m attempting to achieve: Primary ISP is used by everything on my network when it is online. If the primary ISP is offline, only my critical devices (home server and a few other devices) start using the backup ISP. Once the primary ISP comes back online, the critical devices start using the primary ISP again.

    You will have to create another SD-WAN Policy with "everything" of your network below the one you showed in your post with only the Primary ISP, also enable the "Route only through specified gateways" for that.


    If a post solves your question use the 'Verify Answer' button.

    Ryzen 5600U + I226-V (KVM) v20 GA @ Home

    XG 115w Rev.3 8GB RAM v19.5 MR3 @ Travel Firewall

  • Well, that did not go well, lol. I created a SD-WAN route at the bottom for everything only for my primary ISP. 

    Afterwards, I changed my routing precedence as suggested and that's where it went down hill. I was no longer able to SSH into Sophos XG and my entire network stopped working (at least I couldn't access the web or Sophos XG). I restarted Sophos XG just by pushing the physical power button on the device it's running on, but even after the restart, I could not SSH into Sophos XG or access the web.

    I had to pull my device running Sophos XG out of my server rack and hook it up to a monitor to change the routing precedence back to how it was, static, SD-WAN Route, and VPN Route. Everything is working again...

    I'm not quite sure why that happened, but before I go changing anything again, I could use any tips on preventing that issue...

    ---

    Sophos XG guides for home users: https://shred086.wordpress.com/

  • Any other tips on how to possible achieve this or how to prevent locking myself out of Sophos XG when changing the precedence for routing?

    Update: After about a week of having my SD-WAN Routes disabled, I re-enabled them and so far everything seems to be working as expected...sort of. I'm seeing minimal usage from my backup ISP, even though I would expect to see zero usage unless the primary ISP is down, but it's only used about 0.02GB over the past week which I can live with. The biggest thing is making sure this issue doesn't creep up again, which I suspect it will.

    ---

    Sophos XG guides for home users: https://shred086.wordpress.com/