Help us enhance your Sophos Community experience. Share your thoughts in our Sophos Community survey.

[9.171][BUG] Intel NIC hang

Hi,

I was doing some local LAN network load testing with Iperf on my new hardware, and Iperf suddenly got a Network Error.

I looked in the UTM's kernel log and found a corresponding error there for eth1.

eth1 is an Intel PCIe card, with Intel 82572EI chip, in a PCIe x16 slot.

Motherboard is a Gigabyte GA-Z87N-WiFi with Intel Z87 chip.

Iperf was running on separate machines, not on the firewall.

IPS and ATP were running, all other features were disabled.

2013:11:28-19:39:07 fw kernel: [ 2245.063734] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   TDH                  
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   TDT                  
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_use          
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_clean        
2013:11:28-19:39:07 fw kernel: [ 2245.063734] buffer_info[next_to_clean]:
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   time_stamp           
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_watch        
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   jiffies              
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_watch.status 
2013:11:28-19:39:07 fw kernel: [ 2245.063734] MAC Status             
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY Status             
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY 1000BASE-T Status  
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY Extended Status    
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PCI Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   TDH                  
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   TDT                  
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_use          
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_clean        
2013:11:28-19:39:11 fw kernel: [ 2249.059583] buffer_info[next_to_clean]:
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   time_stamp           
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_watch        
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   jiffies              
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_watch.status 
2013:11:28-19:39:11 fw kernel: [ 2249.059583] MAC Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY 1000BASE-T Status  
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY Extended Status    
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PCI Status             
2013:11:28-19:39:13 fw kernel: [ 2251.068899] ------------[ cut here ]------------
2013:11:28-19:39:13 fw kernel: [ 2251.068904] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0xe7/0x182()
2013:11:28-19:39:13 fw kernel: [ 2251.068905] Hardware name: Z87N-WIFI
2013:11:28-19:39:13 fw kernel: [ 2251.068906] NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
2013:11:28-19:39:13 fw kernel: [ 2251.068906] Modules linked in: sr_mod cdrom ipt_MASQUERADE xt_policy xt_hashlimit xt_connlabel xt_NFQUEUE xt_connmark xt_mark xt_tcpudp xt_set xt_multiport xt_addrtype ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_ftp ip_set_hash_net nfnetlink_queue ebtable_filter ebtables redv2_netlink af_packet ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG xt_condition(O) xt_logmark xt_confirmed xt_owner ip6t_REJECT ipt_REJECT xt_state ip_set red2 ip_scheduler red nfnetlink_log nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables x_tables ipv6 loop acpi_cpufreq mperf crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 aes_generic xts gf128mul coretemp i2c_i801 pcspkr e1000e(O) microcode evdev rtc_cmos ehci_pci ehci_hcd button sg sd_mod xhci_hcd thermal fan processor thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
2013:11:28-19:39:13 fw kernel: [ 2251.068938] Pid: 0, comm: swapper/0 Tainted: G           O 3.8.13.6-18.g9aea9e6-smp64 #1
2013:11:28-19:39:13 fw kernel: [ 2251.068939] Call Trace:
2013:11:28-19:39:13 fw kernel: [ 2251.068939]    [] ? dev_watchdog+0xe7/0x182
2013:11:28-19:39:13 fw kernel: [ 2251.068943]  [] ? warn_slowpath_common+0x78/0x8d
2013:11:28-19:39:13 fw kernel: [ 2251.068945]  [] ? netif_tx_lock+0x7e/0x7e
2013:11:28-19:39:13 fw kernel: [ 2251.068946]  [] ? warn_slowpath_fmt+0x45/0x4a
2013:11:28-19:39:13 fw kernel: [ 2251.068948]  [] ? netif_tx_lock+0x43/0x7e
2013:11:28-19:39:13 fw kernel: [ 2251.068950]  [] ? dev_watchdog+0xe7/0x182
2013:11:28-19:39:13 fw kernel: [ 2251.068952]  [] ? call_timer_fn+0x1b/0x6e
2013:11:28-19:39:13 fw kernel: [ 2251.068953]  [] ? run_timer_softirq+0x16c/0x1b3
2013:11:28-19:39:13 fw kernel: [ 2251.068955]  [] ? __do_softirq+0x9d/0x15f
2013:11:28-19:39:13 fw kernel: [ 2251.068959]  [] ? disable_cpuidle+0xb/0xb
2013:11:28-19:39:13 fw kernel: [ 2251.068960]  [] ? call_softirq+0x1c/0x30
2013:11:28-19:39:13 fw kernel: [ 2251.068962]  [] ? do_softirq+0x3f/0x79
2013:11:28-19:39:13 fw kernel: [ 2251.068965]  [] ? irq_exit+0x43/0xb1
2013:11:28-19:39:13 fw kernel: [ 2251.068966]  [] ? disable_cpuidle+0xb/0xb
2013:11:28-19:39:13 fw kernel: [ 2251.068967]  [] ? reschedule_interrupt+0x6d/0x80
2013:11:28-19:39:13 fw kernel: [ 2251.068968]    [] ? __hrtimer_start_range_ns+0x271/0x284
2013:11:28-19:39:13 fw kernel: [ 2251.068972]  [] ? cpuidle_wrap_enter+0x3c/0x71
2013:11:28-19:39:13 fw kernel: [ 2251.068973]  [] ? cpuidle_wrap_enter+0x32/0x71
2013:11:28-19:39:13 fw kernel: [ 2251.068975]  [] ? cpuidle_enter_state+0xa/0x33
2013:11:28-19:39:13 fw kernel: [ 2251.068976]  [] ? cpuidle_idle_call+0x9e/0xcc
2013:11:28-19:39:13 fw kernel: [ 2251.068977]  [] ? cpu_idle+0x61/0xa9
2013:11:28-19:39:13 fw kernel: [ 2251.068979]  [] ? early_idt_handlers+0x120/0x120
2013:11:28-19:39:13 fw kernel: [ 2251.068980]  [] ? start_kernel+0x372/0x37e
2013:11:28-19:39:13 fw kernel: [ 2251.068982]  [] ? repair_env_string+0x5d/0x5d
2013:11:28-19:39:13 fw kernel: [ 2251.068983]  [] ? x86_64_start_kernel+0x102/0x10f
2013:11:28-19:39:13 fw kernel: [ 2251.068984] ---[ end trace 918e7f6542337fab ]---
2013:11:28-19:39:13 fw kernel: [ 2251.068987] e1000e 0000:01:00.0 eth1: Reset adapter unexpectedly
2013:11:28-19:39:16 fw kernel: [ 2254.158485] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


I also checked the logs on the Fedora Iperf client; there were no hardware or driver errors.

Please let me know if any other information is needed.

Barry
  • Hi Barry,

    Thanks for reporting. The response I got from developer was that this "might" be fixed by the driver update in 9.180. 
    What kind of traffic were you sending (bitrate and packet size)?
    Will get more information hopefully next week.

    Best regards and have a nice weekend!
    Bianca
  • Hi,

    I was using a single Iperf instance with the default packet size, tcp.
    Bitrate was around 300mbps.

    Thanks,
    Barry
  • Please download the kernel available at Index of /heitzenberger/v9/driver-updates/v9.20x.  It contains some driver updates, including the one you are using.

    Please install it by doing

     $ rpm --force -i $RPM

    Then do a reboot and check if the problem is solved.
  • Is this the same fix used in 9.107?

    CTO, Convergent Information Security Solutions, LLC

    https://www.convergesecurity.com

    Sophos Platinum Partner

    --------------------------------------

    Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries.  Use the advice given at your own risk.

  • Hi, I installed the new kernel and am back online; will try to load test again tonight.

    Thanks!
    Barry


    fw:/root # rpm --force -ivh /tmp/kernel-smp64-3.8.13.6-4.g9916619.x86_64.rpm
    Preparing...                ########################################### [100%]
       1:kernel-smp64           ########################################### [100%]

    Kernel image:   /boot/vmlinuz-3.8.13.6-4.g9916619-smp64
    Initrd image:   /boot/initrd-3.8.13.6-4.g9916619-smp64
    Root device:    /dev/disk/by-id/ata-Hitachi_HTS721010G9SA00_MPCZN7Y0HJBWYL-part6 (/dev/sda6) (mounted on / as ext4)
    cp: cannot stat `/etc/netconfig': No such file or directory
    find: `/lib/modules/3.8.13.6-4.g9916619-smp64/kernel/drivers/scsi/device_handler': No such file or directory
    modprobe: Module usbcore not found.
    WARNING: no dependencies for kernel module 'usbcore' found.
    Kernel Modules: hwmon thermal_sys thermal processor fan usbhid hid-generic scsi_mod libata libahci ahci edd ohci-hcd uhci-hcd ehci-hcd xhci-hcd sd_mod
    which: no sg_inq in (/sbin:/usr/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin)
    cp: cannot stat `': No such file or directory
    Features:       acpi block usb
    Bootsplash:     default (1024x768)
    ldd: ./: not regular file
    17514 blocks
  • Hi, I ran Iperf (default settings) through the UTM overnight; after about 2 hours, the NIC crashed again:

    2013:12:03-02:10:04 fw kernel: [41335.227898] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
    2013:12:03-02:10:04 fw kernel: [41335.227898]   TDH                  
    2013:12:03-02:10:04 fw kernel: [41335.227898]   TDT                  
    2013:12:03-02:10:04 fw kernel: [41335.227898]   next_to_use          
    2013:12:03-02:10:04 fw kernel: [41335.227898]   next_to_clean        
    2013:12:03-02:10:04 fw kernel: [41335.227898] buffer_info[next_to_clean]:
    2013:12:03-02:10:04 fw kernel: [41335.227898]   time_stamp           
    2013:12:03-02:10:04 fw kernel: [41335.227898]   next_to_watch        
    2013:12:03-02:10:04 fw kernel: [41335.227898]   jiffies              
    2013:12:03-02:10:04 fw kernel: [41335.227898]   next_to_watch.status 
    2013:12:03-02:10:04 fw kernel: [41335.227898] MAC Status             
    2013:12:03-02:10:04 fw kernel: [41335.227898] PHY Status             
    2013:12:03-02:10:04 fw kernel: [41335.227898] PHY 1000BASE-T Status  
    2013:12:03-02:10:04 fw kernel: [41335.227898] PHY Extended Status    
    2013:12:03-02:10:04 fw kernel: [41335.227898] PCI Status             
    2013:12:03-02:10:08 fw kernel: [41339.225486] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
    2013:12:03-02:10:08 fw kernel: [41339.225486]   TDH                  
    2013:12:03-02:10:08 fw kernel: [41339.225486]   TDT                  
    2013:12:03-02:10:08 fw kernel: [41339.225486]   next_to_use          
    2013:12:03-02:10:08 fw kernel: [41339.225486]   next_to_clean        
    2013:12:03-02:10:08 fw kernel: [41339.225486] buffer_info[next_to_clean]:
    2013:12:03-02:10:08 fw kernel: [41339.225486]   time_stamp           
    2013:12:03-02:10:08 fw kernel: [41339.225486]   next_to_watch        
    2013:12:03-02:10:08 fw kernel: [41339.225486]   jiffies              
    2013:12:03-02:10:08 fw kernel: [41339.225486]   next_to_watch.status 
    2013:12:03-02:10:08 fw kernel: [41339.225486] MAC Status             
    2013:12:03-02:10:08 fw kernel: [41339.225486] PHY Status             
    2013:12:03-02:10:08 fw kernel: [41339.225486] PHY 1000BASE-T Status  
    2013:12:03-02:10:08 fw kernel: [41339.225486] PHY Extended Status    
    2013:12:03-02:10:08 fw kernel: [41339.225486] PCI Status             
    2013:12:03-02:10:09 fw kernel: [41340.228370] ------------[ cut here ]------------
    2013:12:03-02:10:09 fw kernel: [41340.228386] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0xe7/0x182()
    2013:12:03-02:10:09 fw kernel: [41340.228390] Hardware name: Z87N-WIFI
    2013:12:03-02:10:09 fw kernel: [41340.228394] NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
    2013:12:03-02:10:09 fw kernel: [41340.228397] Modules linked in: sr_mod cdrom ipt_MASQUERADE xt_policy xt_hashlimit xt_connlabel xt_NFQUEUE xt_connmark xt_mark xt_limit xt_tcpudp xt_set xt_multiport xt_psd(O) xt_addrtype ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_ftp ip_set_hash_net nfnetlink_queue ebtable_filter ebtables redv2_netlink ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG xt_condition(O) xt_logmark xt_confirmed xt_owner ip6t_REJECT ipt_REJECT xt_state ip_set af_packet red2 ip_scheduler red nfnetlink_log nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables x_tables ipv6 loop acpi_cpufreq mperf crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 aes_generic ehci_pci ehci_hcd e1000e(O) i2c_i801 xts gf128mul coretemp microcode pcspkr evdev rtc_cmos button sg sd_mod xhci_hcd thermal fan processor thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
    2013:12:03-02:10:09 fw kernel: [41340.228522] Pid: 0, comm: swapper/2 Tainted: G           O 3.8.13.6-4.g9916619-smp64 #1
    2013:12:03-02:10:09 fw kernel: [41340.228525] Call Trace:
    2013:12:03-02:10:09 fw kernel: [41340.228528]    [] ? dev_watchdog+0xe7/0x182
    2013:12:03-02:10:09 fw kernel: [41340.228544]  [] ? warn_slowpath_common+0x78/0x8d
    2013:12:03-02:10:09 fw kernel: [41340.228552]  [] ? netif_tx_lock+0x7e/0x7e
    2013:12:03-02:10:09 fw kernel: [41340.228558]  [] ? warn_slowpath_fmt+0x45/0x4a
    2013:12:03-02:10:09 fw kernel: [41340.228565]  [] ? netif_tx_lock+0x43/0x7e
    2013:12:03-02:10:09 fw kernel: [41340.228576]  [] ? dev_watchdog+0xe7/0x182
    2013:12:03-02:10:09 fw kernel: [41340.228583]  [] ? call_timer_fn+0x1b/0x6e
    2013:12:03-02:10:09 fw kernel: [41340.228588]  [] ? run_timer_softirq+0x16c/0x1b3
    2013:12:03-02:10:09 fw kernel: [41340.228598]  [] ? timekeeping_get_ns+0x12/0x35
    2013:12:03-02:10:09 fw kernel: [41340.228606]  [] ? __do_softirq+0x9d/0x15f
    2013:12:03-02:10:09 fw kernel: [41340.228613]  [] ? clockevents_program_event+0x9a/0xb9
    2013:12:03-02:10:09 fw kernel: [41340.228626]  [] ? disable_cpuidle+0xb/0xb
    2013:12:03-02:10:09 fw kernel: [41340.228633]  [] ? call_softirq+0x1c/0x30
    2013:12:03-02:10:09 fw kernel: [41340.228640]  [] ? do_softirq+0x3f/0x79
    2013:12:03-02:10:09 fw kernel: [41340.228647]  [] ? irq_exit+0x43/0xb1
    2013:12:03-02:10:09 fw kernel: [41340.228655]  [] ? smp_apic_timer_interrupt+0x85/0x93
    2013:12:03-02:10:09 fw kernel: [41340.228664]  [] ? apic_timer_interrupt+0x6d/0x80
    2013:12:03-02:10:09 fw kernel: [41340.228667]    [] ? __hrtimer_start_range_ns+0x271/0x284
    2013:12:03-02:10:09 fw kernel: [41340.228684]  [] ? cpuidle_wrap_enter+0x3c/0x71
    2013:12:03-02:10:09 fw kernel: [41340.228690]  [] ? cpuidle_wrap_enter+0x32/0x71
    2013:12:03-02:10:09 fw kernel: [41340.228697]  [] ? cpuidle_enter_state+0xa/0x33
    2013:12:03-02:10:09 fw kernel: [41340.228703]  [] ? cpuidle_idle_call+0x9e/0xcc
    2013:12:03-02:10:09 fw kernel: [41340.228709]  [] ? cpu_idle+0x61/0xa9
    2013:12:03-02:10:09 fw kernel: [41340.228714] ---[ end trace e3ad26570058072e ]---
    2013:12:03-02:10:09 fw kernel: [41340.228734] e1000e 0000:01:00.0 eth1: Reset adapter unexpectedly
    2013:12:03-02:10:12 fw kernel: [41343.567366] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


    IPS was not running this time; Iperf throughput was about 500mbps.

    Thanks,
    Barry
  • Hi, does the 9.180 update have this kernel?

    If not, do you want me to do more testing before upgrading to 9.180?

    Thanks,
    Barry
  • Hi, does the 9.180 update have this kernel?

    Hi Barry,
    First of all many thanks for testing.
    The Kernel used in 9.180 is kernel-smp-3.8.13.6-18.g9aea9e6.

    We created an ID for kernel update to 3.8.13.13 and update Intel drivers to latest stable, it should fix the crash occurred with your type of NIC. This is targeted for release 9.190.
    Please feel free to update to 9.180. [:D]

    Best regards, 
    Bianca
  • Thanks for reporting. We are now tracking this as Mantis ID #29775
  • The Mantis ID #29775 is now under investigation. We are planning to release a fix for this issue in Version 9.190.