Sophos Central Endpoint and SEC: Computers fail/hang on boot after the Microsoft Windows April 9, 2019 update. Please follow knowledge base article 133945
Learn about the Benefits of Multi-Factor Authentication (MFA). Turn your MFA on now!
We'd love to hear about it! Click here to go to the product suggestion community
There are many posts concerning this same symptom however nearly all have centered on proper DNS settings and ensuring the the Web Polices are configured properly. The issue I am experiencing does not appear to relate to any of those I have found because my testing utilizes: (1) The recommended DNS settings; and (2) Bare bones config that does not filter anything.
This issue makes the XG unusable in my particular environment and if left unresolved, the XG will be pulled from service.
As mentioned above, I ensured that my testing included the fixes I was able to find in the other posts discussing this symptom. A summary of the pertinent points follows:
Despite all of the above config and testing, the slow page load times persist whenever the Web Policy is set in the FW rule. Making the single change of removing the Web Policy from the FW rule immediately restores the page load times to what they are as if the XG was not even in the network - i.e. sub 3-5s.
Interesting Point: The actual throughput performance is NOT affected - only web page load times. I have performed literally hundreds of throughout tests using Speedtest and DSL Reports - all run great once the page loads thereby reinforcing the idea that there is some flow inspection issue going on here.
while my link speed is not anywhere near as good as yours, I went through a lot of testing to improve my XG performance.
I ended using the OPEN DNS (188.8.131.52, 184.108.40.206 and 220.127.116.11) settings on both the XG and the users configuration and that appears to have fixed the performance issue. That is with IPS, proxy, ATP etc in place.
Looking at the firefox status line I see a lot of the time is taken negotiating a TLS connection.
I did try only using the XG ISP DNS and that didn't work very well, then tried the XG OPEN DNS with the users using the ISP DNS, that didn't work.
In reply to rfcat_vk:
Yep - that is the most common solution however, it doesn't resolve the issue I am seeing. I've had the same Google DNS servers you mentioned configured both on the XG and all test machines during the better part of two days testing...issue still persists as long as Web Policy is enabled.
I suspect there is a defect in how the Web Policy engine is handing changes in DNS servers. For example, despite changing the DNS setup, the XG may need a reboot due to how those servers are loaded into the policy engine...not sure but I've seen similar issues where the devs just forgot to ensure the sub-system checked for a given change and therefore the system kept using the old settings. Something along that line might be the issue...
In reply to cyberzeus:
yes, the reboot of the XG does seem to be a common theme after some changes are made. I have found in the past you actually had to power it off, then remove the power lead for a short time, so when he box restarts the cache is too old and has to be flushed.
I have said this before and undoubtedly will say it many times again, the QA of this product is very poor, just look at the fixes between MR-1 and MR-2.
WOW...ummmm, I mean...WOW...need to actually disconnect power????
Well, yeah - as much as I find this thing to be an awesome device - at least by intent...that kinda of frailty is a bit disturbing. But I also keep in mind that it's free...so as the adage goes, you often get what you pay for.
I did reboot but no joy. Maybe later I will give the power disconnect a shot...
In your main post you do not state if the XG is doing DNS/DHCP for your client systems or if you are using a system behind your XG for these services. Also a couple of things:
Since you are adding 127.0.0.1 to your DNS servers I am going to assume that DHCP/DNS for your local network is being handed out by your XG to provide local DNS lookups for your devices/systems. Have your tried the following:
This is going on the presence that when you make a request to the internet the XG is doing a lookup of your client host to resolve it to see what policy(s) might need to be applied.
Also remember on the client device/system make sure you flush DNS cache after making changes to all the DNS settings mentioned above.
Also what are your DoS values set to on the XG.
I have a configuration for a Home licensed XG that is using the OpenDNS servers and the systems behind the XG are set to use the XG as their DNS server.
Hope this helps.
In reply to rrosson:
The XG is literally doing one thing only - a single Web Policy applied that has only the default Allow All rule enabled. All DHCP and DNS is done external to the XG. All DoS\Spoof protection is disabled\not-applied and therefore not a factor. As for the 127.0.0.1 DNS setting - is was a test step from the Cyberoam troubleshooting document and is not the typical setting - just one of the attempted test vectors.
All DNS queries be them directly from the XG or by clients behind the XG all complete fine using Google\Comcast\AT&T DNS servers and with typical lookup times sub-1s.
In observing this a bit more, I am starting to suspect HTTPS - possibly scanning. What is strange however is that all HTTPS scanning has been disabled both on the FW rule and in Web Protection...
Ok, Since you are doing DNS/DHCP external of the XG may I recommend you do the following:
Test your response time for web requests and see if that speeds things up. in looking at my test XG VM while writing this and comparing it my production UTM 9.x it makes me wish that XG would also act as a DNS proxy. With XG you have to turn on the other features of the rule for inspection.
Hope this helps
I've had this issue when setting up my Sophos XG Firewall..
*My personal fix was to put my ISP's DNS in the DNS of the sophos XG.
You may want to check your lookups by going into the Diagnostic menu on your sophosXG then go down to Name Lookup.
In reply to Gavin Ramm:
I use Google's DNS servers and the lookup response times are sub-1s both from the XG as well as all clients behind the XG. The only time using the ISP's DNS servers is required is if the ISP happens to block or otherwise handle DNS requests which may be the case in your specific situation. Neither of my providers do that as evidenced by the very low DNS lookup response times.
understood, however this is what allows my website response times to be quicker.
EDIT: I just changed them back to 18.104.22.168 and 22.214.171.124 and they seem fine, must have been coincidence
I'm not sure you're completely reading all of the information in my posts. Aside from the "Internal DNS" suggestions, all of the other suggestions you make have already been performed and noted in my posts.
Also, given the very low DNS query times, I am convinced DNS is not the heart of the issue - especially given that there are no actual filtering rules being employed. Furthermore, "Internal DNS" can only mean 1 of 2 things: (1) Setting up actual internal DNS servers - which I know you don't mean; or (2) DNS proxy - which will only serve to make the issue worse - not better.
As an aside, OpenDNS is not a good tool to use for testing...you never want to use a filtering DNS system when using DNS lookups as a testing metric...always use something as wide open as possible (such as Google) when testing DNS.
Sorry I must have missed that the SophosXG DNS test part.
Maybe take a step back, You say that the throughput is not affected but only the website loading time.
maybe do a test with a Chrome and press F12 to bring the network activity window.
Disable the rule and capture what a successful web browsing times look like.
Enable the rule again and capture what an unsuccessful web browsing time look like.
This could highlight where the lag time is, what it's waiting for or what ever is failing.
Well, I think there's more than one issue resulting in the same symptom. Also, with stuff like DNS, because it is a system far removed and one you can't directly monitor, it's really easly to step on your tail - so to speak - when testing. And this is especially true when there may be more than one issue causing the same symptom.
I believe the need to have reliable DNS on the actual XG is, in part at least, so that the XG is able to reliably communicate with the Sophos cloud for things like the Cyberoam data used for the web rules and such. I have a fairly extensive post elsewhere on this site discussing how the actual data is compiled and managed for all of the web\app filtering subsystems and from that discussion, I have gathered that a lot of communication occurs behind the scenes between the XG and the mother-ship. And of course, reliable DNS would play an important and central role to ensuring that back-office communication takes place. I'm not 100% sure as yet as the discussion is still evolving but that's what it seems like to me anyway.
So just to confirm your experience...say first thing in the morning after your machine has been idle for a while or even turned off, your web page load time are snappy? And again, I'm not referring to actual throughout - that is fine - I mean only the page load time...
my personal experience is that my web pages are always snappy (apart from when my internet link is being soaked)
From the first time in the morning.
Good test but already done - I used Wireshark - when possible, always good to use test tools that are not on the DUT. Nothing in the CAPs jumped out immediately however I haven't yet had the time to do the pkt-by-pkt, timing, TCP-SM analysis that it requires - probably this weekend.
However, your suggestion caused me to come up with 4 or so additional test\captures.
Will report back with what I find...
Thanks for the "muse" action... :=)