Is your Failover working?

 Hey guys - have you tested your failover actually fails over and passes Web Traffic?

 

I have an XG450 17.5.5 MR5 - two PPPoE connections into the XG Direct. One is used a Primary WAN and one as Backup WAN.

WAN LinK manager configured correctly and to fire up the Backup if the Primary drops. Tested this by unplugging the primary WAN port - after a short time my RED reconnects to the Backup WAN fine.

All User based FW rules are set to MASQ as a NAT and have both WAN connections set as Primary and Secondary.

Now when I failover no web traffic passes - no matter what I try with Gateway Specific NAT or adding 0.0.0.0 to go via the Backup Port etc - I cannot pass web traffic.

 

Have you actually checked yours does pass traffic? I am thinking Bug as its not a routing issue and I can see the secondary connection has indeed kicked in and is passing traffic to RED.

 

Any suggestions?

 

  • Hi,

    you appear to be suffering from the mythical reconnect bug that does not exist, if you care to check the XG history. Many users have complained about this and it only affects the XG, never an issue with the UTM.

    I found that by editing but not changing the link caused the software to refresh the connection.

    Now from memory the Sophos answer was the XG is RFC compliant. Well who ever wrote the RFC was in pixie land of theory not reality and XG people who tested this must have gone to the Boeing school of software testing.

    Ian

  • In reply to rfcat_vk:

    Yeah and I always get to deal with that Mythical Sophos Level 1 Support Tech who knows all that answers about naff all first up.

    I have three open tickets with them and still going months later - they have zero chance of making Sophos Connect work so I have no optimism over getting this resolved.

     

    I wonder what I can get for my Sophos boxes on eBay.....

  • Its a weird thing....

     

    External access back inbound is fine, pinging to outside from CLI is fine, but ping outside to the world behind a device on the local lan interface based on IPV4 (No DNS required) or red interface doesn’t work, trace routes stop at the XG and don’t pass onto the PPPOE default gateway, almost as if the Sophos XG doesn’t know how to handle the route table for external, even then adding in a manual Unicast route table for 8.8.8.8 out of the WAN port still doesn’t enable local devices to ping out to that ip… Either the XG is bugging out the backup interface weight size or bugging out the source natting behind the natted interface. I imagine this would be quite easy to replicate in-house on a virtual Sophos XG for replicating the issue.

  • In reply to M8ey:

    Simply start to perform Dumps and check, what is actually going on on this interface.

    You have the option to write the dumps into a file.

    https://community.sophos.com/products/community-chat/f/knowledge-base-article-suggestions/105811/how-to-tcpdump-on-xg

     

    PS: Actually both interfaces should be up and running in XG, not like SG (Backup interfaces are actually Down and will boot after master interface fails). 

     

    So start two dumps on both interfaces and write them into file. Perform your testing again and download the files. 

    Check in wireshark what is going on. 

    Could be couple of different issues (NAT is applied, but wrong IP, XG tries wrong interface, etc.etc.)

     

  • In reply to LuCar Toni:

    Could be many things however such a simple failover that doesn't failover is of concern.

    I have a ticket with Support to check it out as this and many other XG failings is beginning to become tiresome.

     

    Very difficult to produce as well as it must be done later at night when all users are away.

  • In reply to M8ey:

    Hi,

    what I think you are trying to capture which you can do during the day is the PPPoE handshake and keep alive packets. I think the XG ignores the PPPoE packets in preference to its own WAN link management and that is possibly where the conflict lies.

    Ian

  • In reply to rfcat_vk:

    To be clear.

    XG should have a stabile connection on both peers in all situation.

    So basically you should be able to build a test policy: 

    TOP.

    SRC: Your PC.

    DST: WAN

    Service: ANY 

    Use Gateway: Backup Interface. 

    (Do not select Load Balancing or the Main Interface as "backup Interface").

     

    This should use the backup interface while everything still works. 

    So you can start to debug this. Does this policy work and you test Client can reach the internet? 

    Verify the connection is working properly via Conntrack: 

    CLI: conntrack -E | grep PC_IP 

     

    You should see open connections and you should see your firewall ID (which you create above). 

     

    If this does not work, there seems to be some issue with the WAN Interface.

    Now perform a tcpdump on WAN interface (tcpdump -ni PortX) 

    Check what is actually going on here while you are testing with your client. 

  • In reply to LuCar Toni:

    LuCar Toni
    XG should have a stabile connection on both peers in all situation.

     

    So interestingly - if I enable the Backup and add a weight of 100, I get traffic trying to use this connection over the primary WAN

     

    As NAT is broken traffic fails to pass.

     

    I have a Support session with Sophos booked for after hours tonight. Lets see what comes from that.

  • In reply to M8ey:

    So spent 2+ hours with Sophos Support last night trying everything.

    Sophos cannot work out why its broken - but we did discover when it fails over the Firewall Logs  / Packet Capture shows a Firewall VIOLATION 

     

    Its now been escalated to Level 2 and they are thinking its a bug.

     

    Have you guys actually tested your Failover lately?

     

  • In reply to M8ey:

    Hello,

    i have also similar issue of failover not working but mine is a slightly strange. I have 2 active-active lines with same weight. when the high bandwidth line goes down we loose internet.

    Strangely i can ping/browse google.com search in google.com but when i click on any search result or browse any website it doesn't work.

    Support says its problem with internal DNS but i don't understand how it works with other line???

     

    Thanks,

    Shrikant

  • In reply to Shrikant Patil1:

    Hi,

    how is your DNS setup, where does it point to, do you have a DNS firewall rule? Are both your lines to the same network provider?

    Ian

  • In reply to rfcat_vk:

    Hello

    this is my DNS. the last one is of ISP

    no rule in Firewall for DNS

    Yes, both lines from same provider

    For LAN I have an internal DNS server with local IP which is not configured in Firewall anywhere..and i think its normal

     

    Thanks 

  • In reply to Shrikant Patil1:

    My issue is that the Firewall won't allow traffic from LAN - WAN

    Its not DNS based as even pinging IP on the net fails and the firewall Logs just show FW=0 VIOLATION

    My DNS points to an internal IP for my DNS server with one pointing to 8.8.8.8

     

    The issue is with Global Support now and still going no where fast.

  • In reply to Shrikant Patil1:

    Where does you internal DNS get its updates from?

    Also I would check the response speed of your DNS settings gs in the XG, I gave up on the google DNS as being too slow.

    Ian

  • In reply to rfcat_vk:

    Hello All,

     

    now the internet issue is resolved... i updated DNS settings in my local DNS server.

    but the issue remains with RED. RED goes offline when one ISP line goes down, it doesn't reconnect on secondary line automatically 

     

    Thanks 

    Shrikant