SD-WAN settings not working correctly on reboot

We have two WAN links and have setup SD-WAN policy routing.

We have a rule setup so that all traffic from our server should go over one of the WAN links. This server has software running that pings a variety of external services.

When we reboot the XG Firewall (EAP3 Virtual Machine), a trace route shows that the majority of pings are routed through the correct WAN link but some are being incorrectly routed through the wrong WAN link.

My rule is below

  • I've done some more work on this with a limited amount of testing.

    If I stop the software running the pings, restart the XG and then restart the pings after the XG is back up and running, the pings all route through the correct interface.

    My guess is that the SD-WAN routing takes a little time to kick in, if a ping is issued before the SD-WAN routing is effect, it can be routed out through the wrong interface. Once a route is established the XG router maintains that route in a table which doesn't get overwritten by the SD-WAN coming into operation so the 'wrong' routing remains in effect.

    As I say, this is only a guess based on the observed behaviour. Really need someone from Sophos to confirm this and if appropriate get this problem resolved.

    Equally, I may have something wrong in my rule in which case I would be grateful if someone can point out what it is.

  • Hi Jason,

     

    Your understanding is correct PBR demon comes up little later in the startup process due to other subsystem dependancy and hence interim connection may flow via default GW.

     

    Regards,

    Alok

  • Hi Alok

    Thanks for confirming my observations.

    The question is, what is going to be done to fix it?!

    It wouldn't matter if the correct flow started once the PBR started but unfortunately the old established flow continues. Can't you just clear any established routes once the PBR demon starts, then they would flow correctly?

  • Sorry but I re-read your answer after posting my reply and can't edit it. There is nothing incorrect in my last reply but I wanted to make it quite clear what the problem is.

    You said "interim connection may flow via default GW". That would not be a problem. The problem is that any "interim" flows that are established before PBR starts, remain in effect. So if the interim connection is wrong, it stays wrong! What should happen, once PBR starts, it should clear all the connections and establish new ones based on the PBR rules.

  • Hi Jason,

     

    Approach to minimise the impact is yet not decided, hence I won't be able to answer this right away. 

     

    On the suggestion clearing established routes (session table) may have system wide impact.

     

    Regards,

    Alok

  • Thanks for the reply Alok

    I understand you don't have an answer for this right away but it is definitely something that needs sorting.

    I understand clearing the session table may have some impact but as this would normally only happen on a reboot of the XG device then I can't see it as an issue as connectivity would already have been impacted.

    Is there a CLI command to clear the session table that I can use as a workaround?

  • FYI.

    console> system diagnostics utilities connections count v4 v6
    console> system diagnostics utilities connections v4 delete show
    console> system diagnostics utilities connections v4 delete conn_id dest_ip proto src_ip console> system diagnostics utilities connections v4 delete Connections are flushing ,Please wait.. Connections are flushed console>