Issue in the SDWAN routing engine

Hi,

I'm experiencing a strange issue with the SDWAN routing engine. I have 2 Sophos XG connected via route-based ipsec (xfrm interfaces) and using SDWAN rules for the routing decision.

The XG located at the branch office route traffic, using a SDWAN rule, from the subnet 192.168.112.0/24 to 192.168.111.0/24.

In the SDWAN rule I'm using the "Route only through specified gateways" options.

As you can see the traffic incoming is routed via xfrm6 interface.

But sometimes the packets are not routed correctly. Instead of going out through the xfrm tunnel they are routed to the PPPoE interface. 

 Disable, and re-enable, the SDWAN rule fix the issue, at least temporarily.

I'm not able to determine the root cause of the issue. Any ideas?

Thanks



Added TAGs
[edited by: emmosophos at 6:35 PM (GMT -8) on 20 Jan 2023]
  • Could you show us your SD-WAN rule? 

    __________________________________________________________________________________________________________________

  • Of course!

    The rule target the VoIP traffic (SIP and RTP) and the WebRTC. I remove the labels because the naming convention would mean anything to you.

  • So basically you need the following: 

    If you see the problem, go to the advanced shell. 

    Do a "conntrack -L | grep IP"   (Replace IP with the actual IP above). 

    Then check the output and look for your 5060 Connection. You can also do multiple | to make the search simpler. 

    Within the output there is something called pbdir0= and pbdir1= This flag describe the used SD-WAN Route. If this is 0, the SD-WAN route is not hitting. If there is a number other than 0, the SD-WAN Route is hitting.

    We need to know, where to continue to look. 

    __________________________________________________________________________________________________________________

  • The next time the issue occurs I'll check che connection tracking using conntrack command.

    I used the Connection list, using the Sophos XG GUI interface, and filtering by the device IP the connections list was empty.

  • Here the output:

    conntrack -L | grep 192.168.112.15
    proto=udp proto-no=17 timeout=149 orig-src=192.168.111.250 orig-dst=192.168.112.15 orig-sport=5060 orig-dport=5060 packets=6699 bytes=3223555 reply-src=192.168.112.15 reply-dst=192.168.111.250 reply-sport=5060 reply-dport=5060 packets=17398 bytes=9730930 [ASSURED] mark=0x0 use=1 id=1166308911 masterid=0 devin=xfrm6 devout=Port1.11 nseid=0 ips=0 sslvpnid=0 webfltid=0 appfltid=0 icapid=0 policytype=1 fwid=11 natid=0 fw_action=1 bwid=0 appid=38 appcatid=11 hbappid=0 hbappcatid=0 dpioffload=0x3f sigoffload=0 inzone=5 outzone=8 devinindex=32 devoutindex=24 hb_src=0 hb_dst=0 flags0=0xa0000200000 flags1=0x4012002800000 flagvalues=21,41,43,87,89,101,104,114 catid=0 user=0 luserid=0 usergp=0 hotspotuserid=0 hotspotid=0 dst_mac=45:60:01:d2:af:db src_mac=40:00:3f:11:28:85 startstamp=1673877154 microflow[0]=INVALID microflowid[1]=350 microflowrev[1]=7 hostrev[0]=0 hostrev[1]=65 ipspid=0 diffserv=0 loindex=24 tlsruleid=0 ips_nfqueue=0 sess_verdict=2 gwoff=0 cluster_node=0 current_state[0]=3527 current_state[1]=3527 vlan_id=0 inmark=0x0 brinindex=0 sessionid=1824 sessionidrev=25973 session_update_rev=6 dnat_done=0 upclass=0:0 dnclass=0:0 pbrid[0]=0 pbrid[1]=1 profileid[0]=0 profileid[1]=0 nhop_id[0]=17 nhop_id[1]=65535 nhop_rev[0]=0 nhop_rev[1]=0 saidx[0]=0 saidx[1]=0 saidx_rev[0]=0 saidx_rev[1]=0 atomic_flags=0x0 conn_fp_id=NOT_OFFLOADED

  • It is odd to me, your primary route is 0. The backup route is 1. Do you have other SD-WAN routes in place? 

    __________________________________________________________________________________________________________________

  • Yes, I have multiple xfrm gateways.

    The Sophos XG have 2 WAN interface. On these 2 WAN I've got 4 xfrm interface (4 tunnel IPSEC, two for each WAN), but the SDWAN in question target a specific gateway, for this reason I check the "Route only through specified gateways".

    When the traffic is routed correctly this is the conntrack:

    conntrack -L | grep 192.168.112.15
    proto=udp proto-no=17 timeout=149 orig-src=192.168.111.250 orig-dst=192.168.112.15 orig-sport=5060 orig-dport=5060 packets=6699 bytes=3223555 reply-src=192.168.112.15 reply-dst=192.168.111.250 reply-sport=5060 reply-dport=5060 packets=17398 bytes=9730930 [ASSURED] mark=0x0 use=1 id=1166308911 masterid=0 devin=xfrm6 devout=Port1.11 nseid=0 ips=0 sslvpnid=0 webfltid=0 appfltid=0 icapid=0 policytype=1 fwid=11 natid=0 fw_action=1 bwid=0 appid=38 appcatid=11 hbappid=0 hbappcatid=0 dpioffload=0x3f sigoffload=0 inzone=5 outzone=8 devinindex=32 devoutindex=24 hb_src=0 hb_dst=0 flags0=0xa0000200000 flags1=0x4012002800000 flagvalues=21,41,43,87,89,101,104,114 catid=0 user=0 luserid=0 usergp=0 hotspotuserid=0 hotspotid=0 dst_mac=45:60:01:d2:af:db src_mac=40:00:3f:11:28:85 startstamp=1673877154 microflow[0]=INVALID microflowid[1]=350 microflowrev[1]=7 hostrev[0]=0 hostrev[1]=65 ipspid=0 diffserv=0 loindex=24 tlsruleid=0 ips_nfqueue=0 sess_verdict=2 gwoff=0 cluster_node=0 current_state[0]=3527 current_state[1]=3527 vlan_id=0 inmark=0x0 brinindex=0 sessionid=1824 sessionidrev=25973 session_update_rev=6 dnat_done=0 upclass=0:0 dnclass=0:0 pbrid[0]=0 pbrid[1]=1 profileid[0]=0 profileid[1]=0 nhop_id[0]=17 nhop_id[1]=65535 nhop_rev[0]=0 nhop_rev[1]=0 saidx[0]=0 saidx[1]=0 saidx_rev[0]=0 saidx_rev[1]=0 atomic_flags=0x0 conn_fp_id=NOT_OFFLOADED

  • Hi, i did actually misinterpret your outprint. 

    pbrid0= means the source and destination network have a matching SD-WAN rule. 

    pbrid1= means, the reply traffic have a matching sd-wan rule, if you would split traffic.

    This means, the firewall cannot find a matching rule for this traffic. 

    And that is correct: 

    orig-src=192.168.111.250 orig-dst=192.168.112.154

    Does not match: 

    __________________________________________________________________________________________________________________

  • Hi,

    thanks for your reply!

    If I understand correctly pbrid[1]=1 means that reply traffic matches an sd-wan rule.

    In my case the Sophos XG in the branch office sends traffic from 192.168.112.0/24 to 192.168.111.0/24 (as you can see in the screenshot), but I don't have any other sd-wan rules that match traffic from 192.168.111.0/24 to 192.168.112.0/24.

  • Is your network (111.0) directly attached? 

    __________________________________________________________________________________________________________________