This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Connection lost when downloading file from specific FTP (weird!)

Hello,

I have a very, very weird problem:

When I download a large file from a specific FTP, my uplink is lost after some minutes.
(when I download the same file from another FTP with the same speed, this problem doesn't occur! ???)
When I get the error, UTM reports "State: Up" and "Uplink: Error". My modem (connected via Ethernet) shows a normal connection (all LEDs are lit normally).

When I then do nothing, the connection will come back after 5 to 20 minutes.

While I'm disconnected, I can see DHCPACKs in the system log, but not a single server from outside can be reached.

When I restart the firewall, the time for a comeback of the line seems to be the same.
When I restart my modem, I have a connection after the modem's initialization.

This led me to the conclusion that the problem must lie beyond my firewall.

I then contacted my ISP and they observed my modem while I forced the disconnect with the download.

The interesting thing was: I phoned over IP. When I lost the connection, I could hear my ISP for a minute or so but he couldn't hear me. So packets seem to reach me, but no packet could be sent.
And: While I was disconnected, my ISP could call the webinterface of my modem normally and could ping servers from there, while I was unable to ping any server from my UTM.

So now the conclusion seems to be, that the problem is on my UTM. What the heck is going on here? :-)
I found this thread and I had hopes, but the ethtool call didn't help: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/30148/9-312-intel-82572ei-e1000e-hardware-unit-hang

I cannot tell it exactly and it is maybe coincidence, but maybe the problems began when updating my UTM from 9.504-1 to 9.605-1.
I have a single internet line with no load balancing.

Thanks in advance for any help.
This problem is driving me mad.



This thread was automatically locked due to age.
Parents
  • If FTP is public, please post the URL.

    Do you phone from inside LAN or phone connected to ISP-Router?

    try to ping 8.8.8.8  all the time while downloading and connection is broken. -- result ?

    try to ping 8.8.8.8 from SG (Support / tool) while connection is broken.

    If you have a ISP-router with multiple LAN-ports, connect a PC/Notebook and try to access internet while connection is broken.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Reply
  • If FTP is public, please post the URL.

    Do you phone from inside LAN or phone connected to ISP-Router?

    try to ping 8.8.8.8  all the time while downloading and connection is broken. -- result ?

    try to ping 8.8.8.8 from SG (Support / tool) while connection is broken.

    If you have a ISP-router with multiple LAN-ports, connect a PC/Notebook and try to access internet while connection is broken.


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

Children
  • Thanks for your answer.

    It is a private FTP.

    I phone from inside the LAN.

    I tried the ping - directly from the UTM (via SSH): When the (partial) disconnect is happening, I cannot ping any outside server. But at this time the connected modem CAN ping external servers normally (as tested by my ISP).

    I want to test your last suggestion with an old notebook, but I currently have none at hand. And I don't want to connect my main PC to the internet without a firewall. I will try to get an old notebook for this.

  • I did your last suggestion and the notebook was also disconnected. So the modem seemed to be the culprit.

    But: I phoned my ISP and they changed my modem to another model from another manufacturer, but the problem is still there!?

    The only difference is now that the disconnect lasts only a few seconds (2 to 3 seconds). The new modem seems to reset something much more faster than the old modem.

    When the short disconnect happens, I can see this in the system log of the UTM:

    2019:09:20-16:33:51 home ntpd[1145]: Deleting interface #19 eth0, 192.168.xxx.xxx#123, interface stats: received=0, sent=0, dropped=0, active_time=155 secs
    2019:09:20-16:33:51 home ntpd[1145]: Deleting interface #20 eth0, xxxx::xxx:xxxx:xxxx:7a44%2#123, interface stats: received=0, sent=0, dropped=0, active_time=155 secs
    2019:09:20-16:33:53 home ntpd[1145]: Listen normally on 21 eth0 192.168.xxx.xxx:123
    2019:09:20-16:33:53 home ntpd[1145]: Listen normally on 22 eth0 [xxxx::xxx:xxxx:xxxx:7a44%2]:123
    2019:09:20-16:33:53 home ntpd[1145]: new interface(s) found: waking up resolver

    With this new facts I would say that the problem is in my UTM.

    Is there something I can do to test it further?

  • by the way: eth0 is my internal LAN. I wonder why the UTM is deleting the internal interface eth0 for a short moment...?

    Currently I would bet that the bug is in the network driver of the UTM, because my UTM setup hasn't changed in years and in the past I downloaded many large files from this FTP without any problem.

    Some more info about my UTM: CPU is always at 5% to 15%, RAM always around 30%. Log disk 1%, Data disk 11%.

    Currently only the firewall and the network visibility are enabled.

  • If the directly connected notebook is disconnected too ... how can the UTM be the problem?

    ntpd[1145]: Deleting interface #19 eth0, 192.168.xxx.xxx#123 ... seems the interface is going down. (NTP-deamon removes the Connection from usable list) Possible the Modem/Router reboot or reinitialize the interfaces

     


    Dirk

    Systema Gesellschaft für angewandte Datentechnik mbH  // Sophos Platinum Partner
    Sophos Solution Partner since 2003
    If a post solves your question, click the 'Verify Answer' link at this post.

  • eth0 is my internal lan, not my WAN.

    After a lot of thinking I came to the conclusion that I had 2 different problems.

    Problem 1 was (maybe) some incompability between my modem and the LAN adapter on my UTM. This caused the long disconnects.

    This is solved now. And this is very good :-)

    But I had these short disconnects before.

    And this seems to be problem number 2. And this problem happens on my internal LAN adapter (eth0) and somehow causes the short disconnects. eth0 is directly connected to my internal switch.

    Is there something i can do to identify the problem, besides trying an new switch?

  • OMG, I found the solution for problem #2: A bad cable between my switch and the UTM, simple as that. (very annoying: it wasn't a cheap cable)

    Now eth0 isn't resetting anymore (new cable) and eth1 (WAN) has no longer these persistent disconnects (new modem, cable unchanged).

    So I really had 2 different problems (I'm lucky!) and both are fixed now.

    Thanks for your help.

  • Others that see this thread will want to follow #7 in Rulz (last updated 2019-04-17).

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • In case someone has ever a similar problem, here is my update:

    It wasn't the cable. It was just coincidence that it worked the first hour. The problems came back after then.

    But I had a new theory: I used the same hardware (an atom D510) for my private firewall for years, because at home I don't need all UTM features and my bandwidth wasn't that big.

    But over the years my private internet connection got faster and faster and the UTM got more and more features, but my hardware stayed the same.

    While downloading a file in the past, I could see that the CPU of the firewall jumps to 23% - 25% and I thought: Ok, it can't be a CPU problem.

    But I forgot one critical thing (and I can blame myself that I didn't think earlier of this):
    The CPU has 4 threads and most network related tasks can't be paralleled - they must run on one thread. And with 4 threads this means 25% cpu. Ugh.

    So maybe sometimes the cpu can't handle the full load and the network buffer runs full. And maybe this resets the interface.

    So I thought it would be a very good idea to update my hardware after all these years :-)

    And yes, it was a good idea: after transfering the config of the UTM to the new machine, there are no more resets on the internal interface.

    The new machine has also 4 threads, but is 15x faster. CPU is at 5% while downloading a large file.

    If I extrapolate this to a thread on the old cpu, 5% x 15 would be 75% on the old thread, which was then capped at 25%.

    So yes: The old CPU wasn't capable to handle the full load.

    tl;dr: Always check if a thread of the CPU can handle the full network load. If a thread (not the CPU!) is constantly at 100%: Houston, we have a problem.

  • Hallo,

    Thanks for closing the loop on this issue!

    For others that see this thread, when in top, you can see the loads on the individual CPUs by touching the digit 1.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA