This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unable to ping certain IP address intermittently

We have had reports of a certain IP address being available most of the time but occasionally it will become unavailable and the user is unable to ping it at this time. We have an XG 135 running SFOS 18.5.1 MR-1-Build326.

I don't see any blockages in the firewall logs for the timeframe in which the problem last happened. I would like to try and rule out the XG 135 and would appreciate any suggestions for debugging. It is difficult as it is intermittent.

This thread was automatically locked due to age.

Top Replies

Alan Spark over 2 years ago in reply to Alan Spark +2 verified

Just to close this thread, I did open a ticket and they confirmed that the behaviour was unexpected. They advised updating to SFOS 18.5.3 MR-3-Build408 which I did and since then I have not seen this issue…

0 LHerzog over 2 years ago

is this traffic logged when it works? Do you have a custom rule at the bottom that blocks and logs everything? That is not there from scratch and you may not see every firewall block.

you could analyze and see if these issues you're reporting overlap with IPS updates. That's known to cause network disconnects on the small appliances as the SNORT services restart. I'd say, this is most likely your issue.

read here:

https://community.sophos.com/sophos-xg-firewall/f/discussions/128637/dropped-connections-during-pattern-updates
Cancel
Vote Up 0 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to LHerzog

Thanks for your reply.

LHerzog said:
is this traffic logged when it works? Do you have a custom rule at the bottom that blocks and logs everything? That is not there from scratch and you may not see every firewall block.

Yes, it is logged when it works but we did not have a rule blocking and logging everything. I have now added the following:

I did manage to reproduce the problem myself just now after adding the rule. I tried to SSH into the affected IP address and it timed out. I checked the logs but there was nothing blocked. Then I tried again and I could SSH successfully. It is as if the first SSH attempt wakes something up.

LHerzog said:
you could analyze and see if these issues you're reporting overlap with IPS updates. That's known to cause network disconnects on the small appliances as the SNORT services restart. I'd say, this is most likely your issue.

This sounds familiar. We have had this issue before (almost 1 year ago) and were advised to disable firewall acceleration (https://community.sophos.com/sophos-xg-firewall/f/discussions/127621/intermittent-vpn-issues). We have not had any further reports of the issue until now but cannot be sure if it was ever actually resolved by this change, I suspect not. We have also had a firmware update since then and I understood that a fix was going to be in that update. However, I can confirm that firewall acceleration is still disabled at our end.

I will look into the analysis again but this will be quite difficult as I can't control when it happens...
Cancel
Vote Up 0 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to Alan Spark

Further to this, I was able to reproduce it again and performed a packet capture. The corresponding firewall rule appears to be the built in "drop all" rule which is not logged.

I see the items below repeated over and over when in the normal state when accessing over SSH. I note the different port number between the failure (11757) and success (1508) logs.

I ruled out the "automatic pattern update" issue as I increased the frequency from 2 hours to daily before the above. I have restored this to 2 hours again.
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 2 years ago in reply to Alan Spark

That's interesting, that it seems your first connections goes into violation and second is working. Some king of DPI thing - is that traffic going into TLS inspection? Have you already checked the other logs? Sometimes I focus on firewall log and miss the events that are shown in IPS or TLS log section.
Cancel
Vote Up 0 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to LHerzog

Thanks, good point about the other logs - I also tend to focus on the firewall. However, I have been through them all including IPS and TLS and don't see any corresponding entries.
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 2 years ago in reply to Alan Spark

do you use heartbeat, Intercept X on the endpoints? Firewall Violation can also be caused by missing heartbeat and bad helath status on a device. Our endpoints get blocked because of missing HB caused by endpoint updates multiple times a week after the endpoints updated some components.

To rule out firewall completely, you will need to create a firewall rule on top for a single host that is known to have this issue and allow this traffic without any Security features enabled.
Cancel
Vote Up +1 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to LHerzog

No, we don't use any endpoint features.

We don't currently have any of the listed security features enabled for our existing VPN rules. Is this what you meant?
Cancel
Vote Up 0 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to LHerzog

I haven't changed any firewall rules but I think I have proved that it is definitely something in the UTM that is blocking the traffic. This morning I reproduced the issue and was unable to SSH into the affected server for a few minutes, during that time I could successfully SSH from a machine on the internal network (i.e. bypassing the UTM).
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 2 years ago in reply to Alan Spark

Yes, I meant, what you shared with the screenshot above.

I understand from your posts, this is VPN access over a WAN connection which is sometimes going into timeout, true?

Can you rule out any connection issues or high latency?

We have sites connected by site-2-site VPN that have poor WAN connections and while the tunnel is up and fine, we're having severe timeouts over the whole day to these sites.

Maybe some of the SA's are temporarily down.

Please describe your environment and VPN topology.
Cancel
Vote Up 0 Vote Down

Cancel
0 Alan Spark over 2 years ago in reply to LHerzog

Yes, this is a VPN connection over WAN and any attempt to communicate with a specific server during a short window of failure results in a timeout. I am testing with SSH but the affected user is seeing the issue with a proprietary application and ping. Yesterday the failure period seemed to be happening roughly every hour.

I think connection/latency issues can be ruled out because I can access other servers during the temporary block of the server that I'm debugging. As I said, I can also access the affected server from the internal network (via remote desktop) so it is VPN specific.

We use both SSL and IPsec VPN and have equivalent firewall rules setup for each set to accept from VPN to LAN with the additional settings in my screenshots above. We already had a rule to drop VPN to WAN and since yesterday have had one to drop VPN to any zone. Neither are being activated or logged. I have reproduced the problem with IPsec whilst the affected user is using SSL.

Here is a diagram of the topology.
Cancel
Vote Up 0 Vote Down

Cancel