This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos UTM IPSEC Slow performance - Site to Site (SSL / IPSEC / RED UTM)

Hi all,

Firstly, I've seen many other posts with similar issues but no real resolution to this reported slow performance between S2S links...but please correct me if there is!

 

My Setup:

1 x Sophos UTM 120 (9.506-2) (200/12mbps) (Virgin Media Modem Mode) (Remote Site 1) (Intel Corporation 82583V) (MTU 1500)

1 x Sophos UTM on HP Microserver (N54L) (9.506-2) (200/200) (FTTP) (Primary Site) (Intel Corporation 82571EB) (MTU 1500)

1 x Sophos UTM on HP Microserver (N40L) (9.506-2) (200/12) (Virgin Media Modem Mode) (Remote Site 2) (Broadcom Corporation NetXtreme BCM5723) (MTU 1500)

 

IPSEC is setup as such:

All sites currently using these settings = AES-128 (But did have AES 256 enabled previously, changed after comments about AES-256 issues) | Strict Routing OFF | Support MTU Path Discovery ON | PFS OFF

Primary site connected to Remote Site 1 & 2

Remote site 1 connected to Remote Site 2 and Primary Site

Remote Site 2 connected to Remote Site 1 and Primary Site

-----------------------

Before the Primary site link was upgraded it had a BT Infinity 2 connection getting 76/19mbps. The connection has been in place since 2014 and I didn't notice any issues as the upload speed was maxed during transfers so never saw a problem.

The Problem now: Whilst the connection is solid, I cannot achieve more than 2.4MB/s (19.2mbps) through ANY of the remote sites via Windows SMB or HTTP/S

If I connect into a remote site (2) and initiate a HTTP/S download from the primary site via the external address (avoiding the IPSEC S2S) I see around 150mbps download, the bottleneck becomes the receiving server CPU with Sophos AV installed, if I disable this it goes up slightly more. So it appears that the site and remote connections are able to connect at higher speeds without issue.

 

-----------------------

I had a look at the "Rulz" and went through all the steps:

Disabled IPS on all UTMs, this was disabled previously for other issues
No QoS rules enabled on any interface on any UTM
MTU (On FTTP) 1500 (Also tried 1492, 1472, 1460, 1432)

MTU on the Virgin Media connections are 1500 at the UTM


-----------------------

The firewall rules between tunnels is set to allow all traffic with no restriction, only traffic outbound to the internet is locked down.

I've performed a CLI speedtest on the UTM's and the results are:

Primary Site: 199.98mbps down / 188.90mbps up

Remote Site 1: 161.50mbps down / 12.03mbps up

Remote Site 2: 182.69mbps down / 11.89mbps up

Ping times between UTM's via tunnel are around 8ms

Ping times between UTM's over the internet are no different

-----------------------

I've seen on other posts that some of the IPSEC performance issues can be related to CPU power, however each time I've initiated a transfer, whilst the CPU usage does go up it doesn't seem to be enough to feature on the top 20 cpu processes, or I don't know what I'm looking for...which could also be true.

These are the top 20 on the remote site 1 UTM:

root 3807 50.0 0.0 4912 1004 pts/0 R+ 10:14 0:00 \_ ps auxf
root 3788 20.6 0.0 0 0 ? Z 10:14 0:01 \_ [confd.plx] <defunct>
root 3793 13.0 0.9 70708 18800 ? S 10:14 0:00 \_ confd [worker:prpc:system]
wwwrun 2278 6.6 4.0 94728 82228 ? S 10:08 0:25 | \_ /var/webadmin/webadmin.plx
wwwrun 2312 5.6 3.7 88048 75856 ? S 10:08 0:20 | \_ /var/webadmin/webadmin.plx
root 2805 5.2 1.6 81784 33016 ? S 10:11 0:09 \_ confd [worker:prpc:webadmin]
root 2397 4.7 1.7 83184 36128 ? R 10:09 0:14 \_ confd [worker:prpc:webadmin]
root 5421 3.6 0.0 26372 1360 ? Ssl Jan16 819:45 ./ctipd.bin -l /usr/lib/ctipd
root 3959 1.8 0.1 14148 3516 ? S Jan16 425:14 \_ /usr/local/bin/selfmonng.plx
root 5624 1.2 1.6 77396 33512 ? S Jan16 285:00 /usr/sbin/acc-agent.plx --verbose=2 --daemon
root 16823 1.2 0.0 33668 1068 ? S<sl Jan26 110:10 /usr/sbin/ulogd -c /etc/ulogd.conf -d
root 28890 1.0 0.0 4304 580 ? S 09:35 0:23 \_ /usr/local/bin/reporter/waf-reporter
810 6033 0.9 14.0 1023256 286656 ? Ssl Jan16 222:06 /var/chroot-http/usr/bin/httpproxy -f -c /var/chroot-http -u httpproxy
root 5481 0.5 17.1 591336 351764 ? Ssl Jan16 112:58 /usr/bin/cssd -d
postgres 3601 0.5 0.2 579216 6064 ? Ss 07:00 0:58 \_ postgres: smtp smtp 127.0.0.1(51262) idle
root 5485 0.4 0.5 70544 11844 ? Ss Jan16 96:11 smtpd [master]
root 4973 0.4 0.3 14484 6688 ? Ss Jan16 101:07 dns-resolver.plx
root 5015 0.3 0.6 35540 13484 ? Ss Jan16 78:55 awed [master]
root 5004 0.3 5.4 158904 111092 ? Ssl Jan16 80:40 /usr/sbin/named -4
root 3563 0.3 0.9 70708 19236 ? S 10:12 0:00 \_ confd [worker:prpc:acc-agent]

 

Top 20 from Primary Site during transfer:

root 29560 9.6 0.0 0 0 ? Z 10:29 0:00 \_ [confd.plx] <defunct>
root 29570 5.4 0.3 73520 30064 ? S 10:29 0:00 \_ confd [worker:prpc:system]
root 20103 3.8 0.5 87680 45864 ? S 10:06 0:54 \_ confd [worker:prpc:webadmin]
root 5444 3.7 0.0 25940 4472 ? Ssl Jan27 267:19 ./ctipd.bin -l /usr/lib/ctipd
wwwrun 20075 3.4 1.1 96212 91864 ? S 10:05 0:48 | \_ /var/webadmin/webadmin.plx
root 4892 3.2 0.0 34692 2804 ? S<sl Jan27 227:59 /usr/sbin/ulogd -c /etc/ulogd.conf -d
810 18791 1.0 18.6 1850284 1520164 ? Ssl Jan27 73:10 /var/chroot-http/usr/bin/httpproxy -f -c /var/chroot-http -u httpproxy
afcd 12607 0.9 0.4 57420 33580 ? S<sl 09:42 0:27 /usr/sbin/afcd
root 3933 0.8 0.0 14188 5468 ? S Jan27 58:49 \_ /usr/local/bin/selfmonng.plx
root 5602 0.6 0.9 85368 74476 ? S Jan27 45:20 /usr/sbin/acc-agent.plx --verbose=2 --daemon
root 12325 0.6 0.0 4548 1144 ? S 09:41 0:19 \_ /usr/local/bin/reporter/waf-reporter
root 11662 0.4 0.1 18596 11028 ? Ss Jan31 5:08 /usr/local/bin/red_server.plc
root 11 0.4 0.0 0 0 ? S Jan27 31:35 \_ [ksoftirqd/1]
nobody 31497 0.4 0.4 475080 36304 ? Sl Jan29 15:29 \_ /usr/apache/bin/httpd -k start
root 5479 0.3 6.3 609564 518076 ? Ssl Jan27 22:03 /usr/bin/cssd -d
postgres 4897 0.3 0.9 1113396 77572 ? Ss Jan27 21:43 \_ postgres: reporting reporting [local] idle
root 5715 0.2 0.4 70824 34704 ? Ss Jan27 16:44 smtpd [master]
root 5634 0.2 0.3 31924 26044 ? S Jan27 16:59 /usr/local/bin/epp_client.plx
root 4634 0.2 1.5 160408 127152 ? Ssl Jan27 17:27 /usr/sbin/named -4
root 4551 0.2 0.0 14728 8072 ? Ss Jan27 14:10 dns-resolver.plx

-----------------------

I then decided to switch the tunnel types around and the results of these are:

Changing to SSL VPN : No Change in speed, still around 20mbps transfer speed

Changing to UTM > UTM RED setup: Worse, speed would not go above 10mbps!

I also changed the IPSEC security settings to NO encryption, still no real change.

-----------------------

A few people on the forums mentioned they used pfSense (I've put this on a VM) and this resolved the issue...so I tried this on the Primary site (pfSense > UTM > Internet). The UTM still controls the internet with the appropriate ports sent to the pfSense box. I setup the IPSEC connection policies and initiated the connection. 

I saw a huge improvement:

Primary Site to Remote Site 2 (N40L) now sees 150-170mbps transfer speeds.

Primary Site to Remote Site 1 (UTM 120) now sees around 40-60mbps transfer speeds, I think the CPU then bottlenecks, but the licence on this expires in a few months so that will be replaced.

 

So it would seem to me that the WAN configurations are correct but the Primary site UTM is the problem, I don't believe its hardware as its the fastest of the 3.

 

I'll also add that if I establish an LT2P/IPSEC VPN connection from a remote PC, I see maximum speeds via this menthod from the Primary site UTM.

 

Is there anything else I can do to improve the speed? Anything else I've not looked at yet? Or if further information is required then please ask.

 

Any assistance appreciated!

 

Thanks!

 

 



This thread was automatically locked due to age.
  • I don't see any way to avoid having Sophos Support look at your setup.

    What do you get from the following at the command line in the Primary Site?

    ping -I {IP of Primary Internal interface} {IP of Site 1 Internal Interface} -s 1500 -M do

    If you got "ping: local error: Message too long, mtu=1406" as a response, what happens if you set the MTU to that on the device you're copying from?

    Cheers Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi Bob, 

    Thanks for your reply.

    I performed the test you suggested and it came back with 1410 as the max, so I changed the MTU on both systems where the data was being copied from/to. 

    This didn't change anything I'm afraid.

     

    I was up until the early hours of the morning messing with different configurations and even changed hardware over to a Dell R210, there was no change. This led me to changing other aspects of the network. When the FTTP was first installed the ISP provided their own ONT Hybrid router which was removed in favour of the UTM.

    Their own router is rather poor in terms of capability and due to the way its set up, it had to be in a Double NAT setup. I plugged this back in and connected the UTM into it, configured it and then let the UTM establish the IPSEC connections...it was back up to full speed! So I'm truly confused now...

    Problem is, whilst this fixes one issue it breaks more things.

    I reverted back to the original (altered) setup, pfSense IPSEC connections came back up and were at top speeds. IPSEC connections handled by the UTM still remain at around 20mbps.

    I've seen another post you commented on: https://community.sophos.com/products/unified-threat-management/f/general-discussion/89663/issue-of-mss-on-ipsec-vpn

    It mentions the use of a command to change the MSS value, I tried this command but again saw no difference.

     

    So I have two working scenarios:

    1: pfSense for IPSEC > Sophos UTM > WAN

    2: Sophos UTM > Hybrid FTTP Router > WAN (but is double NAT and a number of other services stop working...)

     

    I may indeed need to speak with Sophos...

     

    Thanks!

     

     

     

  • I have the same issue.  Out of curiousity, can you test something for me.  Do a speed test from a PC that uses the web proxy.  Next, add that PC to the transparent skip list for the web proxy and try the speed test again.  I will bet you a beer the speed will be limited to the same speed your VPN traffic is limited (if the speed test even works).

    I have the same problem and it seems to have existed since 9.4.  It is definitely still there in 9.5.  Based on my testing, the problem doesn't exist until you create the first site to site VPN in version 9.4.  In 9.5, this rate limit on nonproxied traffic starts after the first virus definition download.

  • Where do I collect this beer?

     

    Made a few changes to test this...

    I've got a Windows 10 machine that goes via the Web proxy, I can confirm this as certain sites / web ads are blocked. Speed test returns 200/195 via speedtest.net with it on.

    If I place it into the skip list I see 204/198 which is only slightly higher...but the 20mbps speeds I've seen over IPSEC from the Gateways remains. pfSense working fine however! 

     

    If you want me to try any specific settings then let me know.