We'd love to hear about it! Click here to go to the product suggestion community
I have a simple home setup using V18. (WAN to LAN with IPS and web-filtering) Seems to work fine, except few times a day I loose connection to internet for 20-30 seconds (about 10 consecutive ping drops if I leave a ping running). I know my internet on WAN is not dropping as I have a device on wan which do not drop pings or loose connection.
Need some help in trouble shooting. I suspect may be IPS or blocking kicking into block all of my outgoing traffic for few seconds. What logs to look for. need some help please. Do we have a troubleshooting guide for this type of drops. Do we have a troubleshooting guide?
It's quite a coincidence that you post this now as I have been planning to post exactly the same issue.
This has been an issue for us since the pre-release version of v18, in home edition, Hyper-V VM and our XG 230. At first I let it go as early teething issues with new software and hoped a fix would come as the product matured a little but the issue is still there.
I too would like suggestions at where is the best place to start looking at this problem.
it would also be interesting to hear if anybody else is having the same problem.
Thank you for contacting the Sophos Community.
Please connect to the XG following this KB (https://community.sophos.com/kb/en-us/133678)
Once in there press number 4 to land in the console and run the following command:
console > drop-packet-capture 'host X.X.X.X and host 126.96.36.199' (Modify the X.X.X.X to be the Private IP of the computer where you are running the Ping)
If the XG is dropping the traffic you will see something there.
You can also check at the time the issue happens the fwlog.log
In a new Putty session/window now go 5>3 then type cd /log and press enter
then you can type less # less fwlog.log (ctrl + g takes you to the last line) and check the time when the issue happens
Additionally, to this, I would also recommend you to leave or run a conntrack when the issue is happening
#conntrack -E -s X.X.X.X
Check for unreplied packets.
And finally, check the IPS.log for anything the XG might be dropping at that time, and also when the issue is happening confirm if the XG can ping 188.8.131.52
In reply to emmosophos:
One of the areas that is particularly noticeable for us is DNS resolution failures. We use Cloudflare DNS servers, 184.108.40.206 and 220.127.116.11
The drop-packet-capture started showing results fairly quickly, I had 45 drops in 30 minutes:
drop-packet-capture 'dst host 18.104.22.168'
2020-06-26 00:27:44 0110021 IP 192.168.1.101.53621 > 22.214.171.124.53 : proto UDP: packet len: 54 checksum : 258590x0000: 4500 004a da27 0000 7e11 a16b c0a8 fe65 E..J.'..~..k...e0x0010: 0101 0101 d175 0035 0036 6503 01eb 0100 .....u.5.6e.....0x0020: 0001 0000 0000 0001 0377 7777 0667 6f6f .........www.goo0x0030: 676c 6503 636f 6d02 6567 0000 0100 0100 gle.com.eg......0x0040: 0029 0fa0 0000 0000 0000 .)........Date=2020-06-26 Time=00:27:44 log_id=0110021 log_type=Firewall log_component=Identity log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Port1 out_dev=Port2 inzone_id=1 outzone_id=2 source_mac=00:88:8b:85:22:f7 dest_mac=7c:5a:55:4d:22:40 bridge_name= l3_protocol=IPv4 source_ip=192.168.1.101 dest_ip=126.96.36.199 l4_protocol=UDP source_port=53621 dest_port=53 fw_rule_id=25 policytype=1 live_userid=0 userid=65535 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=16 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 nat_id=0 cluster_node=0 inmark=0x0 nfqueue=0 gateway_offset=0 connid=2612661056 masterid=0 status=256 state=0, flag0=36031545800130560 flags1=8796629893120 pbdid_dir0=0 pbrid_dir1=0
I don't know if there is something wrong with my fwlog.log but it seems to contain next to nothing, why is this?:
XG230_WP02_SFOS 18.0.1 MR-1-Build396# tail /log/fwlog.logNOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.NOTICE: Netlink socket buffer size has been set to 8388608 bytes.XG230_WP02_SFOS 18.0.1 MR-1-Build396#
I did check the GUI version version of the firewall log and it showed nothing blocked for a destination of 188.8.131.52
I ran 'conntrack -d 184.108.40.206' but there were so many entries I couldn't find anything useful on my first attempt. I may have another go at this tomorrow or setup a more specific test than DNS lookups which will produce fewer results.
I did look at ips.log but struggled again with the number of entries and couldn't get grep to work. I did look at the GUI version and that didn't show anything for a destination of 220.127.116.11.
Is there any way to download the log files? Finding info in the console can be a bit of a pain if it's not something you are used to doing.
In reply to JasP:
for your home XG please read the following thread, especially the last entry.
In reply to rfcat_vk:
Do you use STAS?
Seems like your XG is dropping because of Identity probing.
Try to adjust the values here:
Test a smaller number (Maybe 10).
In reply to LuCar Toni:
Yes we use STAS
I've set 'Restrict client traffic during identity probe' to 'No', I haven't changed the timeout. For our environment, identifying the user is not critical, I'm far more interested in stopping these drops.
For my own learning, how did you identify this as a potential issue from the information I supplied?
Also, is there any way to download the logs from an XG rather than just view them in the console?
In STAS / XG, there is something called Quarantine for Unauthenticated Users. It means, if a client communicate and a User based Rule exists, XG checks, if this IP is authenticated. If its not, it put this IP into a learning phase and waits to get the Live User online from STAS. This Quarantine last for 1-120 seconds and you can configure the dropping or only learning (Yes / no).
As XG has a cleanup mechanism, sometimes the client gets kicked out and STAS is not able to quickly recover this IP. Hence it will start to drop this client for 120 sec until this client reauth via STAS.
See: https://community.sophos.com/kb/en-us/123156#2-Drop%20timeout%20in%20Learning%20Mode // https://community.sophos.com/kb/en-us/125217
LuCar ToniIt means, if a client communicate and a User based Rule exists, XG checks, if this IP is authenticated. If its not, it put this IP into a learning phase and waits to get the Live User online from STAS.
XG is using some sort of learning phase for every traffic, unrelated to the authentication rule or network rule.
As a Client(IP) is in this learning phase, XG cannot know, if this is a user or a simple IP. Therefore it cannot verify, if it should use a network or user based rule.
Thank you for the reply. I still keep getting the drops, how ever it is difficult to catch the drops using 'drop-packet-capture' console command as the ssh time-outs much quickly. Still I will try to catch a drop, by keeping an eye. (This happens few times a day, and at that time, ssh has timed out and don't have any output so far).
STAT was enabled, but not used as far as I understand in my config (I do not authenticate users). I am going to disable STAT and see if it still happens.
Is there any setting to disable ssh timing out for trouble-shooting purposes?
In reply to Samy Wee:
XG uses a SSH IDLE Timeout.
To prevent this, use a SSH client, which can handle keep a live sessions.
Toni, thank you for the tip; putty is capable of setting a keep-alive on it, under Connection settings. now I will try to capture some drops. Thank you.