This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Slow large file copies via file-share, but fast if using http (both with IPSEC VPN)

I am experiencing bad performance using Windows Server 2016 to copy backup files across an IPSEC VPN.  The files I'm copying are large (12GB) database backup files.

When using windows to copy the files via shared folder, I get sporadic transfer bouncing between zero and 7+MB/s, averaging around 2MB/s.  When I do the same copy, via http, I get a fairly steady 7+MB/s.

 

Does anyone have any suggestions how to get windows file-copy performance up to something reasonable?

Environment:

"Source"

Physical Windows Server 2008R2 (6-core Xeon, 12GB RAM, Intel GB NIC) behind Sophos SG-230 hardware.

"Destination"

VMWare VCloud VM (hosted by commercial datacenter) Windows Server 2016 (2x 6-core CPUs, 16GB RAM, SSD over fiber) behind a VM running Sophos UTM-9 from 64-bit ISO (8-core CPU, 12GB RAM, VMXNET3 NIC on LAN port, Intel E1000 NIC on WAN)

--- Swapped out the Intel nic on the LAN port as a test, no difference...

Sometimes it gets stuck at zero transfer rate for long enough that the copy crashes with error:

"An unexpected error is keeping you from copying the file.  If you continue to receive this error, you can use the error code to search for help with this problem.  Error 0x8007003B: An unexpected network error occurred."

I've gone through BAlfson's Rulz, as I've seen referenced on other such problems, and find no smoking guns... though I'm not the best at reading the firewall logs... I verified I'm not doing anything that violates the non-log rules.

Any suggestions appreciated.

Shad



This thread was automatically locked due to age.
  • Run a packet sniffer and see if you are getting a lot of fragments.  VPN adds some overhead to the packets and can cause the packet size to exceed the MTU.

  • Darrellr,

    Both servers involved, and the Sophos SG-230 and the Sophos UTM 9 VM are all configured for MTU of 1500.

    I did some testing with various packet sizes on PING between the two machines through the VPN. I found that 1394 was the largest that could go through without fragmentation.  So I'm assuming that you are correct and that a Windows "copy" is more than capable of filling a packet beyond that and causing fragmentation/resends.  

    As an experiment, I set both endpoint machines to that MTU, rebooted them, verified that the adapters are showing MTU of 1394 and initiated a moderate sized copy (500MB)... it held fairly steady at 355KB/s transfer rate.... much worse than I was seeing before, though no longer hitting zero, like it did with the default MTU of 1500 on the machines.  This is so much worse that our large backups would take 11+ hours to copy, rather than about an hour or two with the occasional zeroing.  In looking at the adapter properties on the servers, I found that "Jumbo Packets" was set at "standard 1500", which seemed odd to me, but the only other option through the UI was "jumbo 9000", so I'm guessing that's not a direct correlation to MTU that I can see/set through NETSH command-line.

    Is it possible, instead, for instance, to increase the MTU on the IPSEC tunnel, so that 1500 would pass even with the VPN overhead?  Or would that just swell the pipe and fragment on routers/switches between the VPN endpoints (not sure how/where the breaking/streaming of packets happens)?

    Thanks,

    Shad

  • Your MTU will be a hard limit in place by the connection to the ISP.  It would be unusual for the ISP connection to have a larger than 1500MTU on the physical side.  If you increase your MTU, the ISP will fragment the packets anyway.  The ONLY place you will change your MTU would be the VPN endpoint, though, as your switches should slice the packets appropriately.  Also, make sure you are not doing any inspection on the traffic traversing the VPN, but that should not be slowing it down that much.

    There was an MTU related change made in more recent versions of the UTM code.  What version are you running?  Can you verify what your interface says the WAN MTU is set to?

  • All of the NICs on both the UTM-9 VM and Sophos SG-230 show MTU 1500.  I've tried the "Support path MTU discovery" checkbox both ways on the IPSEC Gateway definitions, seems to make no difference.

    Running firmware version 9.411-3 on the UTM VM (destination side, for this exercise), and 9.410-6 on the source-side SG-230 (will update it tonight).

    I have firewall rules defined to limit types of traffic between the different zones over the VPN.

    I'm running the standard suite of "Intrusion Protection" on the zone the destination server resides on behind the UTM-9 VM.

    I'm not understanding what it is you're suggesting... sounds like you might be suggesting changing the MTU on the WAN interfaces on the firewalls, but then you say that raising them will likely result in fragmentation by the ISP?

     

  • The "bug" was that the interfaces defaulted to an mtu of 576 if not provided properly by the ISP via DHCP.  If you raise your own MTU on the WAN interface higher than the MTU the ISP equipment is set to, it will force fragmentation.  If it is already reporting a true value of 1500, then you are good.  I think your versions are not impacted by this bug.

  • Any other suggestions on what to look for or change?

  • Not off the top of my head.  Sorry.

  • Shad, as a mod, I can see your IP, and I don't believe the 576 MTU bug affected your area/ISP.

    From the above, it appears that you didn't find a smoking gun in the Intrusion Prevention or Firewall log.  The one thing that I wondered about was "VMXNET3 NIC on LAN port, Intel E1000 NIC on WAN."  Have you tried a VMXNET3 virtual NIC for the WAN port?  The E1000 virtual NIC is known to be an occasional source of similar problems.

    On both sides of the VPN, have you selected 'Support path MTU discovery' in the Remote Gateway definition?

    Any luck with either of those?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Bob,

    Thanks for the reply.   You probably saw my Comcast IP from the Denver area... I work remotely from Denver, but the networks in question are in Dallas, TX and Suwanee, GA (both at QTS data centers).

    Good call.  I'll swap out the other E1000 on the Sophos VM, see if that makes a difference.

    As a side note, for anyone else experiencing similar problems with windows large file copies (via SMB shared folders): we saw huge improvement by switching to RoboCopy instead of Copy... no extra command-line switches used, just swapped out the commands and we got more consistent throughput and it got rid of the failures we were having with "copy".    We're still only seeing about 50Mbit/s througput, where it looks like we should be able to get closer to 200Mbit/s (fat pipes at both datacenters), so we're still chasing it.

  • Hey Shad.

    Bob's VMXNET3 suggestion should help you there.

    I would also consider using a RED tunnel between the UTMs, if at all possible. I find it to perform much better than IPSec.

    You could also try to tweak some network interface settings from your server, specially the virtual one. Disabling chimney, offloading, and RSS normally helps me on those cases where you just know the network transfer should be faster.

    Regards,

    Giovani