I am wondering how Sophos generates\manages the web content filters used in their products. I am specifically interested in the process for the XG firewall and endpoint products like Sophos Home.
My guess is that Sophos has a dedicated team that researches and manages all content related aspects of the Internet and then uses that data to manage all of the web content related filters however I have no idea if this is true or even close to accurate.
Thanks in advance for your assistance and feedback.
Different product work differently. I know the XG, UTM, and SWA quite well - and the web part of all three of those products are developed by the same team. Sophos Home and other endpoints are done by other teams, but I'm familiar with some of them.
By "web content related filters" do you mean the categorization of websites? Or do you mean the functionality the we provide so that administrators can build policies?
Feel free to ask questions and I'll try to answer.
Hi Michael - thanks for replying...
So what I'm referring to is the backend data that defines the stuff in Protect --> Web --> Categories. As you know, the Categories are what make up the different User Activities which are in turn used to craft Policies. But the heart of this definitely seems to be Categories and given the scope of the Internet coupled with how important it is to get the Category content right, I would think it takes a lot of work and effort to stay on top of what goes into them.
Is there anything that shows exactly how each Category is constructed to filter\apply to a given set of work or action - something akin to regex, Snort policy rules, or other script (FYI - I know Snort is for IPS but just using it as an example here)? What is the process within Sophos that makes all this happen? Is there a process to request modifications or even new Categories?
Hi Michael - EXCELLENT write-up. Thanks so much for this and it definitely does help. I do have a few immediate questions re: the categorization lookups:
Thanks again for your time on this - very much appreciated.
Traffic that originates from the XG itself does not go "through" the firewall and therefore does not follow firewall rules.
This includes:
- Categorization (uses HTTPS)
- DNS, NTP, DHCP, Active Directory, and possible several other services that are needed for general networking
- Up2Date checking for and downloading product revisions (like an MR) eg code (Backup and Firmware, Firmware)
- Up2Date checking for virus definitions, application definitions, etc. eg data (Backup and Firmware, Pattern Updates)
Possibly licensing, not sure how that works.
Off the top of my head I cannot think of anything else but there could be.
There are a few other automatic things that are always allowed through the web proxy, mostly around allowing Sophos Endpoint traffic, but also may allow XG behind another XG. This is under the Sophos Services web exception. There might be other more hidden exceptions as well (I work on multiple products, I cannot recall XG specifically).
There is an added complexity if you need to talk to a upstream web proxy in order talk to the internet (Routing, Upstream Proxy) then all web traffic the system needs to do is routed through the on-board web proxy with a special Allow All rule, which will forward it to the upstream.
I don't happen to know what needs to be opened up if you have another firewall, aside from the above. Possibly this is documented in a KB, I assume Sales and Support would know.
I recall seeing something where in an upcoming release (sorry I have no details) they were looking at supporting "air gap" environments which have no connection to the internet.
Hi Michael, great step by step explanation. It cleared a few doubts I had about the process.
I wanted to follow up with a ver specific scenario. In a really limited navigation quota setup (Satellital connections), is there any way to force the download (caching) of the full or at least most popular sites web categorizations, so the XG wont spend that limited quota constatly, and can do all the updates only when connected to an unlimited WAN?
Thanks in advance for your answer.
No.
In v17 the web categorization is (AFAIK) fairly efficient in caching. I haven't seen any stats for the new SXL4 method that XG uses, but the SXL3.1 method that UTM uses is less than 1% of web requests require cloud lookups for categorization. Because it is caching - larger installations with more uses mean better cache hit rate while a small office with only a dozen people would have a lower cache hit rate.
UTM also has a little-used and generally avoided way of downloading a local copy of the categorization database, which then will get updates a few times a day. AFAIK one of the most common users of this are sites with Satellite uplinks that have poor round-trip-times but reasonable total bandwidth caps (afaik this method actually uses more MB/month).
Another possible option is using a firewall rule with Allow All (or at least with no categorization rules) while you are on satellite. Under the right circumstances if categorization is not needed then it is not performed. However you would still get malware scanning on your web traffic.
In understand, this client in particular needs to cut to the minimum the Xg originated traffic so wanted to know if he could download the full database (or most of it) when he is on a cable modem connection, and then disable categorization updates while the devices are travelling under satellite connections until the are back on cable.
The main reason they need the device is actually for web filtering and traffic shaping based on web and app categories, so the allow all option wouldn't be usefull.
Unfortunately from what you said, I assume the UTM method is not available for XG, and therefore no specific solution for this case.
Thansk for your help anyway.