UTM Tweaking Guide 2.0

==WORK IN PROGRESS - THIS GUIDE ISN'T COMPLETED YET==

More than 3 years after my first "ASG tweaking" guide found under Tweaking Guide 1.0
it's time to completely rework the guide to the newest "best practice" experience. As the UTM lives - it changed in the meanwhile it's color from black / orange to white / blue, and the name from Astaro ASG to Sophos UTM - and also new features and internal rework and optimizations in the product leads to sometimes different behaviour in view of tweaking. There may be some changes to past tweaks, and hopefully also will be extended and reworked by me based on my field experience.

I should mention, that my recommendations may not necessarily correspond with Sophos recommendations itself, and also brought up discussions in the past, as my philosophy doesn't necessarily also match everyone elses philosophy. So give those tweaks a try, if you think something is wrong with your UTM's performance or you want to test, if there's some hidden powers in your UTM to unleash. Those tweaks also doesn't necessarily fit into any enviroment and in some cases also may have adverse effect as expected - so first golden rule: A backup before any changes will help you easily fixing broken things back to original state ;o)

1.0 Basic networking settings:

1.1 TCP Window Scaling
Basically I recommend to make sure, that the "TCP Window Scaling" feature under "Network Security / Packetfilter / Advanced" or "Network Protection / Firewall / Advanced" is checked.
==> TCP window scale option - Wikipedia, the free encyclopedia

1.2 Network interface duplex settings
Check, that autosensing of the duplex settings between UTM and Switches and/or routers negotiates speed and duplex settings correctly. Sometimes some switches, routers or cablemodems tend for some strange reason to handle out 100FDX instead 1G FDX Link, which will lead to effective Internet bandwidths maxing out around ~80MBit's. If happened so, setting the interface speed manually to 1G FDX will avoid that bottleneck.

1.3 DNS Proxy / DNS Order
There are three things to follow, which have proven to give best results - DNS is very essential to UTM proxy performance:
a) DNS order should always be: Client uses internal DC/DNS server (if available), internal DC/DNS uses UTM as DNS forwarder, UTM uses reliable public DNS servers.
b) Do not put the internet DNS servers directly in the DNS forwarder field of the UTM (as UTM does round robin then, which may lead to strange behaviour, if one of the servers fail or becomes lame for whatever reason). Instead create an availability group "PUBLIC_DNS_SERVERS" containing your ISP's DNS and as fallback some open DNS servers as Google (8.8.8.8 and 8.8.4.4) and/or OpenDNS (208.67.220.220 and 208.67.222.222). Availability Group checks should be done to UDP 53, I personally use check every 15s, and fail after 3x, which has proven in the past as a good setting in most cases.
c) NEVER EVER mix internal and external DNS servers in the UTM DNS forwarder tab. This may lead to strage (mis)behaviour and crappy webbrowsing experience due probably non resolving DNS servers trying resolve via root DNS lookups after few secs, which makes everything reeeeeeeeeeeally slow or randomly not loading websites (already seen in some customer setups - that even makes the fastest SG Appliances laggy as hell).

1.4 MTU on interfaces
Usually Ethernet Networks use a MTU of 1500 (default setting in UTM for ethernet interfaces). Make sure if you're using PPPoE/PPPoA (xDSL) for your Internet Uplink, that the MTU value is set lower (usually 1492 for PPPoE, but provider dependent I also already heard from other values as 1460, 1454 etc.). This should be set on the PPPoE Interface on the ASG if a xDSL bridge is used, but also on the according standard ethernet interface, if a xDSL router sits before the UTM, which makes the PPPoE dial in.

1.5 Quality of Service (QoS):

QoS general considerations
QoS can be a nice and helpful thing, but you also easily can mess up your UTMs performance, depending on you traffic usage. In many cases you may live very good with the automatic tuning options, and the rare usage of manual bandwidth pools / traffic selectors (for S2S VPNs or RED traffic for example).Also do not try to slow down all "non productive traffic" if not explicitly required, but rather priorize / guarantee bandwidth for productive traffic according to your needs. Less and simple QoS rules often works better and more predictable, than a complex, huge QoS ruleset.

In the past following (mis)configurations created lot of headache for nothing, so if you want to use QoS, follow the two golden rules below:
a) First verify the QoS Interface speeds. UTM defaults QoS Interfaces to 102400 kbit/sec, which equals to 100MBit Interfaces, and the "Limit uplink" checkbox is active per default. Technically this limits your Interface - even if it's a 1G or 10G interface - to 100Mbit. This leads to a effective throughput around 80MBit/s. This is also a common fault, if you have fast Internet connections >100MBit, but your perftests will top out around those 80MBit or lower. So add for 1G Interfaces another trailing zero (1024000), for 10G two trailing zeros (10240000), if you achieve line speeds, as internal networks connected to switches..
b) For your Internet uplinks or connections to Routers from managed VPN solutions, MPLS etc. you have to set the uplink and downstream speeds to the real speeds instead the theoretical Interface link speeds, otherwise QoS isn't able to calculate bandwidths correctly.

Against my old tweaking guide I do not list preferred QoS settings for Interfaces here, as the behaviour seemed to be quite dynamic in the past, and depending on internal reworks / optimizations in the UTM, especially the web proxy, I changed those automatic settings quite often in the past until the behaviour satisfied me. So simply test, which options will give best behaviour for your enviroment.

Some small read to the 2 automatic options "Download Equalizer" and "Upload Optimizer"

Decription copied out of the online help:
"Download Equalizer: If enabled, Stochastic Fairness Queuing (SFQ) and Random Early Detection (RED) queuing algorithms will avoid network congestion. In case the configured downlink speed is reached, packets from the most downlink consuming stream will be dropped.

Upload Optimizer: If enabled, this option will automatically prioritize outgoing TCP connection establishments (TCP packets with SYN flag set), acknowledgment packets of TCP connections (TCP packets with ACK flag set and a packet length between 40 and 60 bytes) and DNS lookups (UDP packets on port 53)."

However - I've noted over the past years, that the Download Equalizer will start dropping packets of most consuming streams around 90% on downstream and around 80% on upstream due Random Early Detection settings. While this feature effectively helps to fairly distribute available bandwidth accross multiple users and keeps surfing experience also under load snappier for all concurrent users, it will also prevent a single connection to be able to saturate the full available bandwidth, so do not wonder in your next online perftest, that your speedtest will show figures below your available bandwidth which should result with a 100/10 connection to ~90/8 measured - that's intended behaviour ;o)

2.0 Web Proxy

2.1 Web Proxy Caching
In general I like the idea of caching web content to (in theory) save bandwidth. However - excessive caching should be avoided, as it may sooner or later - especially on normal single SATA disks - saturate the storage subsystem I/O, which leads to I/O waits, which leads to active processes waiting to read from or write to disk, which leads to laggyness, which leads to unhappy users (Whoa - that was a long one ;o)). While SSD's (or also RAID systems with a R/W cache) are able to handle much more storage I/O load and usually may avoid in many daily use scenarios those issues happening with single disks, the optimizations below may be anyway helpful, as any avoided I/O will give you spare resources for the future (and also should help prolongening any SSD's life span as a side effect).

Because of the nature of todays websites I came to the conclusion, that webcaching in many cases isn't effective, as most websites are really dynamic, and often may change lot of content within short time (only think about your preferred news site with all it's live tickers, changing top articles, web ads in the sites etc.). This means, that the webproxy may write down gigabytes of daily webtraffic to the disk cache, but you often have in reality only few cache hits. So that's lot of wasted storage I/O maybe also slowing down your surfing experience for nothing. So my general conclusions are:
a) Activating web caching without tweaking will also cache lot of unused content (never a cache hit) and usually isn't recommended unless you have a real slow and saturated internet connection (let's say as a unscientific thumbrule "a downstream of 50...100MBit web caching may start again to bring laggyness into the websurfing experience most likely due high storage I/O on single mechanical SATA disks. I had not the possibility up to now to test, how far r/w cached RAID systems or SSD equipped appliances could go, so I'll recommend here the same figures, until I got more feedback from such systems.
c) Depending on appliance type / base hardware I recommend in enviroments with higher internet bandwidths >100Mbit to generally disable web caching, as it often sooner or later will lead to laggyness. This is also one of the usual first recommendations from Sophos support, if in such enviroments laggyness occurs.

For the under b) mentioned caching exceptions genrally excluding following filetypes from web caching should optimize caching behaviour by lowering disk I/O and lead to a better cache hit ratio due less cache misses. Simply create a web proxy exception to skip caching for "Matching these URLs" and import following regex into it:

\.(gif|jpg|jpeg|png|html|htm|pdf|webp|crl)?$


2.2 Web proxy AV
Sophos UTM offers you 2 AV engines for scanning your web content for malicious stuff. From a AV view I have here 2 recommendations:
a) If in your enviroment performance / latency has priority over the small security gain by using both AV engines, use only single scan.
b) Unless you may have a 2 AV policy (different AV on gateway as on the Clients) and also Sophos Endpoint on your clients, you should set the primary AV scanner under "Management / System Settings / Scan Settings" to "Sophos". Simple reason: Since UTM9.2 supports the Sophos AV engine in the web proxy live lookups, which means, that the UTM does not scan anymore everything locally, but sends hash values of the scan content to Sophos, and will get back immediately a result as infected, clean, unknown. etc. So only unknown files or outdated scans still gets scanned locally on the box, which saves - especially on underpowered or higly loaded appliances - resources, and also often is faster than a local scan.

2.3 URL categorization
The UTM is capable of different ways, how URL categorization is done.
a) Online CFFDB lookups (default, if UTM Endpoint feature is not in use). This setting makes sense, if a undersized UTM (equipped with 1000 users. Besides of few issues in the past with it's stability this offers the best performance boost due fast categorization, but requires lot of memory. Only recommended for Appliances with >1GB of FREE memory and nearly no swapping. Due past issues I also would recommend that one in best case only in real high volume traffic sites with >1000 Users.
d) Online SXL lookups (default, if UTM Endpoint feature is in use). This more advanced method offers the benefits of both worlds - more efficient online lookups, and different to CFFDB it also caches the lookups. This method may reach with growing number of users in optimal scenarios nearly local CFFDB performance, but without the memory footprint of a full local db. This is also my recommended scenario for all appliances >=4GB of memory (or also >=2GB appliances, if memory usage is below 80% with nearly no swapping). UTM can be enforced to use SXL categorization by simply enabling the UTM Endpoint feature (you don't have to install an Endpoint, activation of the feature is sufficient), or via console command "cc set http use_sxl_urid 1"(Note: Since UTM 9.3 is SXL set as default categorization service.)

2.4 Block Advertisers and Web Trackers
That's kind of a hobby of me to block those annoyances with all possibilities a UTM and additional Anti Trackers and Ad Blockers offers me. While I in general understand the desire of marketing people to track user behaviour or placing ads to support a free service, the last years became a disaster in view of user tracking and ad flooded websites, where you have to search the intended content in between of tons of all types of blinking, flashing, popping up and other types of annoying ads. I'm not generally against modest user tracking to refine a free web service or a single, unobstrusive ad in a website to support a free service. But as this is nearly nowhere the case, I fight against data mining and ad flooding in any way I can. A nice sideeffect of blocking trackers and advertisers is, that you save bandwidth, and usually such overloaded websites loads way faster without those annoyances. Especially lame tracking services may massively slow down your surfing experience.

One warning beforehand: Blocking such annoyances may brick some websites / webservices, so the recommendations below should be a good point to start on your way to a less annoying websurfing experience, but may require some finetuning in your enviroment.

This part is still in progress for rework, so use until then my older "Speeding up webbrowsing by blocking advertisers and trackers" thread found here:
community.sophos.com/.../46207

3.0 Intrusion Prevention / Intrusion Detection

3.1 IPS Pattern general
As the UTM offers a simple "one page" selection for protected targets as OS'es, services as databases, webservers etc., do not simply activate IDS/IPS generally with all targets checked. More active patterns == more resource usage == slower throughput & more false positives. Choose only targets, that you have in your networks active (and also have traffic via UTM to it). IT doesn't make lot of sense to check database attacks, if not database traffic is going through the UTM as example...

3.2 Protected Networks
Generally avoid to place "ANY" or external networks in the protected networks field. Only enter INTERNAL networks as LAN, DMZ, WIFI etc. into that field. This will scan incoming traffic to, and outgoing traffic from those networks.

3.3 IDS/IPS Performance Tuning options
If you exactly know, where in your network HTTP servers, DNS servers, SMTP servers and SQL (database) servers are running, enter under "Network Protection / Intrusion Prevention / Advanced" in the "Performance Tuning" part your according servers into the fields. This will restrict specific tests exclusively to those targets, effectively narrowing the ruleset for other traffic, and minimizing false positives. However: You have to think about this twice, as for example webservers or SMTP gateways sometimes also are running on places where you don't think in first place as network printers, switches etc. If you're not completely sure where your services are running, keep this fields empty, so all checks will done on all traffic, and you don't miss maybe vulnerable systems...

3.4 IPS pattern timestamps
With UTM 9.2 timestamps for attack patterns was introduced. This also helps maintaining good troughput and responsiveness and lowers false positives, as you can choose, how old patterns you want to use for specific targets. This can be set under "Network Protection / Intrusion Prevention / Attack Patterns". Pre 9.2 UTM's always used all patterns back to oldest, unpatched windows NT times - who's still using such systems ? So you can fit the pattern age to your updating behaviour and systems in your enviroment. If you know, that all your (up2date) windows systems are always patched at least all 6 months, then also set this to 6 months. If you run old Linux systems or NAS etc, where maintenance isn't always up2date, maybe set this higher to 12...24months etc. This should be done based on your maintenance behaviour of your internal systems. But generally shorter timestamps == less pattern == less resource usage and false positives. The one and only attack pattern I usually let unlimited ("no timeframe") are is the "Protocol Anomaly", as this is a generic one, and often also unpatchable, as there are lot of flaws in the design of the different internet protocols covered, which technically are completely RFC conform, but maybe not used as intended.

3.5 The "one click booster"
In the "Network Protection / Intrusion Prevention / Advanced" tab you also will find the "Activate file related patterns" checkbox. Unchecking this checkbox may also improve responsiveness of your UTM, as those file related checks are  resource intensive. Many of those covered file related attacks also should be mitigated by the UTM proxy services, but there are also some attacks covered, which will not be mitigated by any of the proxies. So this is a decision of your personal security policy, how strict you want go there. I personally disable this feature (it's anyway disabled in new UTM9.2 installations as default, but enabled by updated, older UTM installations).
3.6 Flooding Protection
That´s a classic one. Activating flooding protection with default values might trigger already in nearly every todays setup and start dropping packets. Simply set the packet values enough high, that you don´t see anymore "flood" messages in the ips.log, which might require some finetuning over some weeks (or the ability to calculated a reasonable expected packet rate for you network). Or disable the feature completely if not required.
Also refer to Sophos UTM: How to configure the Intrusion Prevention System (IPS)


More to follow - work in progress...  

Edit:

29-07-2019 Somehow the Layout ways f***ed up in my browsers, restored it's content as text only without formatting



This thread was automatically locked due to age.
[unlocked by: BAlfson at 8:34 PM (GMT -7) on 7 Oct 2020]
Parents
  • Hi, APT isn't the only concern; Windows DNS (and Bind) have been vulnerable to several DNS poisoning attacks; the best mitigation is to have an up-to-date DNS server in front (e.g. the UTM).

    Barry
  • 1.1 TCP Window Scaling
    Basically I recommend to make sure, that the "TCP Window Scaling" feature under "Network Security / Packetfilter / Advanced" is checked. 


    On firmware 9.208-8 and above (presumably) this is located here:

    "Network Protection> Firewall> Advanced" 

    and was turned on by default for me. 

    Thanks
  • Nice guide. My only question is about the DNS and availability groups. It was mentioned that if you put ip's in the forwarders directly, the UTM does a "round robin" for those ip's  as opposed to putting in a DNS availability group which try's all within the first availability group, and will then jump to the next availability group if there is no response? Is this correct?

    Are there any other areas that would benefit from using availability groups? And do you check those groups with the protocol that they use rather than a simple ping?

  • Louis-M said:

    Are there any other areas that would benefit from using availability groups?

    AD Domain Controllers.

Reply Children