This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

9.7 killed eth0 for me... It had to be 'taken out'.

So over the past two or three weeks, I began to notice about every two hours, my internet seeming to drop off and pause for - about 30 seconds or so.  I thought that it was my ISP being dumb again, and kind of ignored it, but poked around casually just to make sure it wasn't something in my network.  Turns out, last night I think I discovered the problem - 9.7 UTM.

Long story short, I found the issue in the Kernel Messages log.  Eth0 quit unexpectedly, and was then being reset.  Constantly.  As in, all the time.  I unfortunately could not capture the log files for this event, because the UTM then became completely unable to be reached, either from inside or outside the network.  And, this was about 10pm my local time.  Not a good night.

Luckily, I had 9.603-1 and a backup from that version available to me.  I ended up having to completely reinstall my UTM and backup.  I applied the two Up2Date version to bring it to 9.605.

Not one problem.  None, since I went back to 9.6 and I tested the hell out of it.  I was also not seeing the eth0 reset message any longer.  When I installed 9.7, that was also about the time I started blaming my ISP for something that they really had nothing to do with, and while I love blaming them, I'm also really glad I never called.  Been watching that log ever since, not seeing what I had going on there at all.  :D  I run my UTM on a 1U SuperMicro system with dual NIC and an Intel quad port card (this is all hardware I bought earlier in the year and created a nice big server rack in my house).

I've noticed a couple of other people sort of describe similar issues, so if you are on 9.7, check that log file and see if you are getting the same thing.  For me, I won't be going back to that version.  I might even try going back to *choking noise and gagging* XG if I have to, and try to learn that hot mess (for me).  However, 9.7 seemed to be my culprit, so beware issues with this version.  And yes, I do have an Intel based chip set card (Intel Corporation Ethernet Connection I217-LM). 



This thread was automatically locked due to age.
Parents
  • Hi  

    Do you have any information about the errors you've seen in kernel logs? I've not seen any trend yet for issues related to Interface after 9.7 update. I'll go through the UTM forum to see if there's anything or if we can find any patterns.

    Regards

    Jaydeep

  • No unfortunately, as I said I couldn't get to it as the UTM went silent and I couldn't even get back into it.  I do remember eth0 interface resetting about every other second.

    I went back to 9.603-1 and everything worked great. (EDIT:  Also updated to .603-4 and -5, no problems there either).  I checked the same logs and did not see the same messages, there was very little there actually, compared to what 9.7 was pushing.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • Interesting.  Earlier this year, I had a client with unexplained slowness related to eth0, so we had Sophos RMA their appliance after we demonstrated that everything worked perfectly if we just put the LAN on eth5.  After the new unit arrived, we switched back to eth0 from eth5 after restoring the configuration.  We were amazed to see the same problem, so we moved the LAN back to eth5 and everything was fine, so we left it there.  I suspect it's a combination of an obscure bug and a particular combination of settings.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • We really noticed it a lot about every two hours, almost on the dot even though eth0 was so quickly resetting itself.  I am somewhat tempted to try and reproduce the error, but at the same time, not - haha.


    Supermicro SuperServer LGA1150 350W 1U Rackmount Server Barebone System, Black SYS-5018D-MF is what I use for my UTM, with an Intel quadport card (currently not used).  I just use the onboard dual NIC.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

Reply
  • We really noticed it a lot about every two hours, almost on the dot even though eth0 was so quickly resetting itself.  I am somewhat tempted to try and reproduce the error, but at the same time, not - haha.


    Supermicro SuperServer LGA1150 350W 1U Rackmount Server Barebone System, Black SYS-5018D-MF is what I use for my UTM, with an Intel quadport card (currently not used).  I just use the onboard dual NIC.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

Children
  • Something that began happening this afternoon.  I did a search of the eth0 error, and seems to be not so much a hardware issue but a Linux/Debian/Ubuntu issue.  I think this might be related to what I was seeing with 9.7, just in a different format.  I found this after I got we just experienced the same issue we had when we had 9.7 installed.

    https://duckduckgo.com/?q=eth0%3A+Detected+Hardware+Unit+Hang%3A&t=ffcm&atb=v158-1&ia=web

     I also found this issue on these forums under version 9 at least.

    https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/29557/interface-eth0-hangs?pi2353=2

    https://community.sophos.com/products/unified-threat-management/astaroorg/f/utm-9-2-beta/65588/9-194-5-bug-intel-nic-crashes-under-load

     

     I wonder if the XG version of Sophos would have the same issue, if this is apparently linux flavor related?

     

     

    2019:11:21-14:58:28 amodin kernel: [1048959.819318] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   TDH                  <2e>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   TDT                  <75>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   next_to_use          <75>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   next_to_clean        <2d>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] buffer_info[next_to_clean]:
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   time_stamp           <10f9a75f9>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   next_to_watch        <2f>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   jiffies              <10f9a78c6>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318]   next_to_watch.status <0>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] MAC Status             <40080083>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] PHY Status             <796d>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] PHY 1000BASE-T Status  <38ff>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] PHY Extended Status    <3000>
    2019:11:21-14:58:28 amodin kernel: [1048959.819318] PCI Status             <10>
    2019:11:21-14:58:30 amodin kernel: [1048961.822349] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:


    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • @ 

    So I updated to 9.7, and here we are again, having the same issue.  I managed to get the Kernal Message loaded and this is what is repeating almost non-stop:

     

    2019:11:22-18:04:15 amodin kernel: [34136.708014] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   TDH                  <26>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   TDT                  <74>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   next_to_use          <74>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   next_to_clean        <24>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] buffer_info[next_to_clean]:
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   time_stamp           <10080dd1c>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   next_to_watch        <26>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   jiffies              <10080e2f1>
    2019:11:22-18:04:15 amodin kernel: [34136.708014]   next_to_watch.status <0>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] MAC Status             <40080083>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] PHY Status             <796d>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] PHY 1000BASE-T Status  <3800>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] PHY Extended Status    <3000>
    2019:11:22-18:04:15 amodin kernel: [34136.708014] PCI Status             <10>
    2019:11:22-18:04:15 amodin kernel: [34136.719680] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    2019:11:22-18:04:19 amodin kernel: [34140.443550] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • So I guess no one figured this out after not hearing from anyone.  My guess with this is that typical driver issue you guys seem to have with the e1000 card which has been an issue for years that you seem to avoid and not address.  I really don't have an option to change it out, because it's part of the SuperMicro board I have.

    There's literally nothing wrong with the NIC.  It's only a problem with using a Sophos product.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)