Troubleshooting guide for XG

Hi All,

I have decided to write this article as a junction where I can reference your questions related with XG Firewall to achieve a faster conclusion. It will help you understand our product in many aspects and technical behavior. Your feedback and suggestion to correct anything here are most welcomed.

NOTE:

A. XG firewall follows TOP-DOWN approach while searching for the matching policy or a firewall rule. Always remember to configure the custom rules on TOP.

B. Refer the system log format attached below to understand the explicit meaning of the back-end logs in XG.

Analysis:

#1- Whenever you experience any unexpected behavior with routing, web filter or Application filter, you can consider two exercises,

  1. With the help of Packet Capture, monitor how the traffic is forwarded and which fw-rule ID forwards the traffic. This will give you a brief insight into what is happening with the packet.
  2. Check “drop-packet-capture” log and look at the reason and fw-rule ID that cause the drop. 
  • While capturing the drop logs, you can also look at the log-id to verify what is the cause of the drop. To understand the format for Log-ID refer the attached system log file to this post.

#2- This information will be helpful for SF internal working understanding and action to be taken on any packet received on SF interface which will be handled by any one in order,

Fragment Reassembly Module > Strict Policy Checking> Connection Bypass Module > DOS policy > Spoof checking > Connection tracking module for Stateful Inspection > Sequence checking >  policy marking and firewall rule matching > Web Access Policy/AV/AS > IPS.

#3- Asymmetric routing is an unwanted situation in an IP network. It occurs when packets flowing in the same TCP connection flow through different routes. 

Asymmetrical routing is when a packet takes a certain path from source host A across the network to destination host B but then a return packet takes a separate path from the source host B to destination host A. For normal data, this is not an issue, but for some special cases, like data traveling through stateful-inspection firewalls, this routing behavior can cause problems.

To overcome Asymmetric Routing-

Logon to CLI Console via Telnet or SSH, go to option 4. Device Console. Execute:

console> set advanced-firewall bypass-stateful-firewall-config add source_network 10.x.x.0 source_netmask 255.255.255.0 dest_network 192.168.1.0 dest_netmask 255.255.255.0 

       
console> set advanced-firewall bypass-stateful-firewall-config add source_network 192.168.1.0 source_netmask 255.255.255.0 dest_network 10.x.x.0 dest_netmask 255.255.255.0

show advance-firewall

Note: 

  1. It is assumed that XG has all the routing information required to reach the remote subnet 10.x.x.0/24.
  2. If you are bypassing specific network from the advance firewall, Scanning and NATing will not apply to that network.

#4- If you face slow browsing/throughput or ping timeouts then execute, ifconfig (in advance shell) or show network interfaces (in device console) and check for errors, collisions, dropped numbers on the communicating interfaces. Try these steps in sequence, to find a resolution.

  1. Confirm that Traffic Shaping [Quality of Service (QoS)'] is not limiting bandwidth.
  2. When you discover Error on interface output, Edit the interface object, and in the 'Advanced Settings' drop down menu, set the MTU to 1350. If that works, check with your ISP to help find the largest setting that works. If this doesn't work, set the MTU back to its original value.
  3. If you discover Dropped packets, Change the Ethernet cable.
  4. If connected to a switch, change the switch port.
  5. If connected to a router or modem, change that device.
  6. On the 'Hardware' tab in 'Interfaces & Routing >> Interfaces', experiment with different settings of fixed speeds and duplex. Make the same settings on the router/switch/modem to which the interface connects. Before testing the change, reboot both devices to force them to renegotiate their connection speed.
  7. Finally, place a non-manageable switch intermediate to the ISP and Firewall.

#4.1-  If Dropped, error and collision packets are not observed then the slowness can be associated with DNS, DoS, IPS, and ISP.

  1. Check if DoS Settings are enabled from System > System Services > DoS & Spoof Protection. Note- Never restrict TCP Flood flag. If you discover dropped packet values here configure the DoS value at a value suitable for your network architecture.
  2.  Check DNS configuration- If there is an Internal DNS server is configured for LAN users and all DNS requests are directed to it. Issues with the Internal DNS Server or the External DNS Server, to which it forwards requests, may result in overall slow browsing. 

Resolution: To resolve this issue, contact appropriate administrators or Server vendors.

#4.2- If multiple ISP Links are terminated on XG and user systems are configured with a particular ISP’s DNS. In this case, the outgoing DNS traffic gets load balanced. Hence, Two (2) possibilities occur:

  1. If a DNS request travels through the ISP Link whose DNS is configured in user’s system, the request is resolved and turnaround time is good.
  2. If a DNS request travels through another ISP Link, the request is dropped because the DNS configured in user’s system does not match ISP’s DNS.

This results in only partial DNS requests in the network to be resolved, which ultimately leads to slow browsing.

Resolution: Configure a Static Route in XG that forwards all DNS Traffic to the ISP Link whose DNS is configured on user’s systems. You can configure Static Routes from Network > Static Route > Unicast.

XG LAN IP is configured as DNS in user systems. Issues with DNS configuration in XG may lead to slow browsing. 

Go to System > Diagnostics > Services to check if DNS Service is running. If service is stopped, restart it by clicking Start.

XG resolves queries using DNS Servers in a top to bottom order. Hence, compare the response times of each Server and place the Server with the least response time at the top.

Refer https://community.sophos.com/kb/en-US/123191.

#5- High CPU, I will not go deep into this module but the best practice I would like to suggest will be to check if the Telnet or SSH Access is kept open on WAN in System > Administration > Device Access> WAN. Always disable SSH and Telnet access on the WAN side when it is not required. 

Look for the IPS settings, take Telnet or SSH and go to option 4. Device Console. Execute: show ips-settings. If you discover maxpkts set to a higher value than 8, set it back to 8, unless it is configured via support for something particular.

#6- If you discover any glitches or misbehavior on the GUI section for XG then, restart TOMCAT service from the advance shell. Tomcat service is responsible for generating the opcode for the GUI configurations in the backend. command: service tomcat:restart -ds nosync

#7- High Availability Prerequisites

Same model, numbers of ports and vendor details should be the same, the Same version.

  1. HA is not supported if WAN links are configured with DHCP or PPPoE. It means HA is also not supported for wireless models.
  2. HA must be disabled on the auxiliary appliance. You would need to enable HA from a primary appliance and it will automatically enable HA for the auxiliary appliance.
  3. Dedicated HA port should be in DMZ zone and SSH access should be enabled for DMZ in both the appliances.
  4. HA peers are physically connected using a crossover cable through this port. The same port must also be used as an HA link port on the peer device.
    For example, if port E is configured as HA link port on the primary device then use port E only as HA link port on the auxiliary device. Make sure that the IP address of the HA link port for both, the primary device and auxiliary devices are in the same subnet.
  5. No alias or VLAN should be configured on dedicated HA port.
  6. MTU/MSS and link speed should be default on dedicated HA port. We recommend connecting dedicated HA link directly between two appliances. 
Models that do not support HA-
 
All wireless models of XG, SG and CR series as well as CR15i.
 
#8- STAS
 
One of the smallest configuration mistakes while configuring STAS is the TIME. The time difference between the XG firewall and the Domain Controller should not be more than 5 minutes. When the STAS logs have the user information but the user is not authenticated on the XG firewall & the live user UI is not populated with the User, then this could be related to the time settings.
 
#9- DNAT/FullNAT
 

If you find the option, Mapped Port is greyed out in the "Forward to" section, then you might want to verify the service definition with these points:

  1. If no TCP/UDP Service Selected -> Disable mapped port option
  2. If multiple services are selected -> Disable mapped port option
  3. If Service selected is with TCP/UDP combination -> Disable mapped port option
  4. If service group is selected -> Disable mapped port option
  5. If Service selected has only TCP or UDP ports, then it must not exceed 16 ports
  6. Public service cannot have both ports and range
  7. Public service should have a single range of ports
  8. The mapped port should have an equal number of ports as given in public service or mapped port should have a single port.

-End

As mentioned earlier, Your feedback and suggestion to correct anything here are most welcome. You can also DM me if I missed covering any useful and handy information.

I hope that helps and I tried keeping it as simple as Sophos does :) Cheers All-

  • Hi Sachin,

    System Log Format.docx seems to be corrupted when trying to open after downloading, could you please provide a working link?

    Thanks!

  • In reply to hjherron6:

    HI hjherron6, 

    Please find the attached copy in PDF 

    Thanks and Regards

    Aditya Patel 

     

    System Log Format.pdf

  • Sachin,

    thanks for sharing the document on how to understand drop-packet filters. I hope that this is a temporary method to troubleshoot drop-packet capture. It is not userfriendly because we need to use "mathematics" in order to get who is the daemon that is dropping the traffic and why.

    In v17 we are waiting for a revolution about logging. I like the drop-packet capture command but you always need the document to understand it. Please make sure that XG uses a quick and easy way to troubleshoot it like UTM9. Sophos follows the spot "Security made simple".

    And most of time time, we also need to launch cat or tail in order to find out more information about logs.

    Even restarting a services is not simple as linux command line (most of the services are not even in the GUI).

    Please make sure that developers understand this big lack inside the XG.

    Some of my colleagues already moved to Fortigate because XG does not reflect what it is supposed to be. I do not want to do the same after v17.

    We are here to give you feedback about technical aspects but even what the market is expecting from XG.

    Hope you listen to us and give us a product that can be a real antagonist against other Leaders inside the magic quadrant.

  • In reply to lferrara:

    , great job on the troubleshooting guide. Dropped packet capture logging is a great feature and although v16 shows most of those drops in the gui, the log-id feature is indeed very handy. 

    Luk, I agree with you on the expectations that we have with v17. v16 has been a good step forward but v15 was lacking in so many areas that v16 mostly fixed the shortcomings of v15. I don't want to go overboard and express all my frustrations in this thread since the goal of the thread is to help people troubleshoot. However your point is still valid. Command line features are a great selling point as long as those features are also available in the gui. For me personally, I like using gui. With different vendors and different cli, I keep forgetting the syntax. Different cli for cisco router, different for juniper, EMC has its own cli, vmware commands are different, microsoft powershell is different, linux syntax is different, ubnt has its own cli, and we still have to remember python and php syntax tooEmbarrassed The point is cli is great as an advanced tool and works fine for routers etc where you create routes when you buy the router and then don't look back for years. In a product like XG, GUI is its best selling point. AlanT has repeatedly boasted about the ability of XG to accomplish tasks with one or two clicks and yet for any troubleshooting, we have to go to cli. For me that is unacceptable. I love the UTM9 because you can do everything in the gui and don't have to use the shell if you don't want to. However the cc and full linux shell is there for advanced users if they wish. I hope that this is the philosophy of XG going forward. Everything that XG is capable of should be available in the GUI and cli should be there to compliment the GUI... not required for advanced troubleshooting. 

    I will not get into the lost sales aspect as I am sure other partners have expressed their feelings on the subject and I hope sophos is listening. I am hoping v17 builds on the foundation of v16 and finishes all the items mentioned here https://community.sophos.com/products/xg-firewall/v16beta/f/sfos-v16-beta-feedback/78908/v16-what-is-still-missing instead of adding other new features and leaving the basic necessities for later.

    Regards

    Bill

  • In reply to lferrara:

    Hi,

     and  Thanks for going through my through the guide and for your feedback. If you feel that the guide is missing some needed troubleshooting steps then please let me know through you DMs to me. And do refer my guide while you answer in most cases it comes handy to not post redundant answers.

    Talking about v17, I can always put my inputs in our internal wiki page where I can let the developers know what customers need and what expectations come with them for the next release. The logging incorporated in XG comes from our legacy product Cyberoam which has done wonderful but, I always believe that change is necessity and we must improve. I think this is the time to have a discussion with the developers on what is going to be improved and are we looking towards logging capabilities in v17. I will keep you guys updated.

    Thanks

  • In reply to sachingurung:

    Sachin,

    Thanks for your time and reply here. Logging is the first thing that gave to be improved on XG. V16 has a better log management compared to v15 but we are far away from utm 9 and competition.

    Send me a pm and we organize a call on what we are expecting from logging.

    As I wrote many times,  "power is nothing without control."

    In XG control is missing.

    I am looking forward to hearing from you.

    Regards

  • In reply to lferrara:

    I wonder where the UTM OS would be with the effort put into the XG?  Hopefully v17 is better than the other "Betas" put fourth.

  • In reply to DomusRegis:

    MatthewKing,

    XG is another project and it is different and new than UTM. We love UTM9 (this is more than sure) but XG has a different approach and Sophos is trying to build a better product by integrating best things of UTM and Cyberoam.

    I have personally criticized XG on version v15 and v16 is still not enough to be as good as UTM9 but with next release, XG should be ready for big installation. Many features are still missing and some of them are not working properly (like logging, IPS, WAF, Country Blocking, etc...) but if you consider the nice things made from v15 and v16 we are more than sure we will get another great version with v17.

    So let's wait! v17 should come soon in beta in a couple of months.

    Regards

  • In reply to DomusRegis:

    Hi Mathew,

    Change is necessity and innovation is what we are focused on. We are trying to cultivate a new product combining the UTM and the Cyberoam OS. We hope that v17 satisfies your expectations.

    Thanks

  • In reply to sachingurung:

    Hi all,

     

    Quick question to the community:

    Should we update this Thread to a newer version? 

    Including stuff like: atop, conntrack, drppkt, tcpdump, Console commands? 

     

    XG has some powerful troubleshooting tools onboard to quickly find issues within the current installation. 

  • In reply to LuCar Toni:

    Yes please.  I read through all the information and guide documents I can find.  The more information, the better!

  • In reply to LuCar Toni:

    LuCar Toni

    Hi all,

     

    Quick question to the community:

    Should we update this Thread to a newer version? 

    Including stuff like: atop, conntrack, drppkt, tcpdump, Console commands? 

     

    XG has some powerful troubleshooting tools onboard to quickly find issues within the current installation. 

     

    I'd vote for that!