This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Regular Kernel Segfaults, abundant cores from many processes, and regular restarts for unknown reasons

Summary: Sophos UTM 9.509-3 is segfaulting (primarily in libperl.so, creating numerous cores per day, and regularly restarting for a variety of reasons...and I'm hoping for some guidance about how to diagnose and fix the problem(s).

I've been using Sophos UTM for a few years, and it has largely just worked well (once I learned to configure it anyway), but over the last several months, it has become more and more unstable. I have gone through the checklists I've found in the forum, re-installed, replaced the hardware it runs on, disabled services I wanted to use (dyndns, smtp, web applications, and others I can't think of right now), and placed a cheap router in front of it to handle the DHCP from my provider, since the dhclient seemed to crash the most at first.

Previous hardware:

  • Zotac ZBOX C1323nano
  • Intel N3150 quad-core
  • 8GB Ram
  • 120GB SSD
  • Onboard Broadcom Dual NIC

New Hardware:

  • Protectli Micro FW appliance
  • Intel Celeron E3865U
  • 8GB Ram
  • 64 GB SSD
  • 6x Intel Gig NICs onboard

Load/Use:

  • Minimal...family/home use with an 8mb/s cap on the connection
  • In line as router
  • Most addresses served via static host definitions
  • HTML5 VPN Portal
  • SSL VPN
  • Multiple DHCP pools (VLANs and VPN)
  • DNS Service
  • 3 VLAN LANs to single port
  • 1 WAN out single port
  • No web application control
  • No server protection
  • No endpoint protection
  • No wireless protection
  • No RED

kernel.log sampling:

2018:08:04-08:40:16 MASKED kernel: [ 583.051247] confd.plx[8428]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:40:29 MASKED kernel: [ 595.879461] confd.plx[8436]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:41:16 MASKED kernel: [ 643.099311] confd.plx[8446]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:41:30 MASKED kernel: [ 657.091345] confd.plx[8454]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:42:16 MASKED kernel: [ 703.146993] confd.plx[8464]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:42:27 MASKED kernel: [ 713.394333] confd.plx[8479]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:42:31 MASKED kernel: [ 717.488466] confd.plx[8495]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:43:16 MASKED kernel: [ 763.198280] confd.plx[8507]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:43:37 MASKED kernel: [ 783.424557] confd.plx[8515]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:44:16 MASKED kernel: [ 823.249377] confd.plx[8525]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:44:37 MASKED kernel: [ 843.823989] confd.plx[8533]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:45:16 MASKED kernel: [ 883.296138] confd.plx[8623]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:45:37 MASKED kernel: [ 904.319037] confd.plx[8631]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
2018:08:04-08:45:38 MASKED kernel: [ 904.600099] confd.plx[8632]: segfault at 18 ip 00000000f7238bee sp 00000000ff91b220 error 4 in libperl.so[f71a9000+14d000]
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2018:08:04-13:48:41 MASKED kernel: [ 54.462473] hwinfo: vm86 mode not supported on 64 bit kernel
2018:08:04-08:48:45 MASKED kernel: [ 58.942814] tun: Universal TUN/TAP device driver, 1.6
2018:08:04-08:48:45 MASKED kernel: [ 58.942818] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
... <snipped>
2018:08:04-08:49:21 MASKED kernel: [ 94.746232] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
2018:08:04-12:37:29 MASKED kernel: [13784.974861] confd.plx[15595]: segfault at 1173d79c ip 00000000f75e5187 sp 00000000ffb09950 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-13:07:50 MASKED kernel: [15606.350081] confd.plx[16454]: segfault at 18 ip 00000000f72a7bf7 sp 00000000ffb09bb0 error 4 in libperl.so[f7218000+14d000]
2018:08:04-14:12:24 MASKED kernel: [19481.241819] named[7044]: segfault at 40090d ip 000000000811fa98 sp 00000000f7403790 error 4 in named[8048000+2df000]
2018:08:04-14:14:23 MASKED kernel: [19600.299025] confd.plx[19009]: segfault at 133b9114 ip 00000000f75e5187 sp 00000000ffb09af0 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-14:14:23 MASKED kernel: [19600.615152] confd.plx[18990]: segfault at 7bc525a4 ip 00000000f75e5187 sp 00000000ffb09950 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-14:14:23 MASKED kernel: [19600.685685] confd.plx[19008]: segfault at 1004 ip 00000000f723fa00 sp 00000000ffb09910 error 4 in libperl.so[f7218000+14d000]
2018:08:04-14:24:33 MASKED kernel: [20210.238828] confd.plx[19469]: segfault at 6b5357bc ip 00000000f75e5187 sp 00000000ffb09af0 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-14:25:33 MASKED kernel: [20271.089005] confd.plx[19580]: segfault at 24 ip 00000000f75e52cc sp 00000000ffb09440 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-14:40:02 MASKED kernel: [21139.663219] confd.plx[20080]: segfault at 11e7580c ip 00000000f75e5187 sp 00000000ffb09950 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-14:47:50 MASKED kernel: [21607.467326] confd.plx[20472]: segfault at 1004 ip 00000000f72cd98e sp 00000000ffb09760 error 4 in libperl.so[f7218000+14d000]
2018:08:04-14:53:55 MASKED kernel: [21972.610510] confd.plx[20632]: segfault at 24 ip 00000000f75e5353 sp 00000000ffb09a40 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-15:07:43 MASKED kernel: [22801.241870] confd.plx[21159]: segfault at c ip 00000000f72b1f81 sp 00000000ffb09950 error 4 in libperl.so[f7218000+14d000]
2018:08:04-15:10:02 MASKED kernel: [22939.658273] confd.plx[21266]: segfault at 1066a694 ip 00000000f75e5187 sp 00000000ffb09af0 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-15:13:01 MASKED kernel: [23119.132675] confd.plx[21362]: segfault at 1004 ip 00000000f725900b sp 00000000ffb09cd0 error 4 in libperl.so[f7218000+14d000]
2018:08:04-15:35:40 MASKED kernel: [24478.449187] confd.plx[22503]: segfault at 11418df4 ip 00000000f75e51f0 sp 00000000ffb09950 error 4 in libc-2.11.3.so[f7573000+16c000]
2018:08:04-20:49:13 MASKED kernel: [ 56.544738] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.


#ls of /var/storage/cores/ after 2 days:

total 9828608
-rw-r--r-- 1 root root 28114944 Aug 3 23:11 admin-reporter..15035
-rw-r--r-- 1 root root 23814144 Aug 3 13:14 admin-reporter..15735
-rw-r--r-- 1 root root 11530240 Aug 3 05:11 admin-reporter..18842
-rw-r--r-- 1 root root 23543808 Aug 3 07:11 admin-reporter..24216
-rw-r--r-- 1 root root 28258304 Aug 3 21:25 admin-reporter..8378
-rw-r--r-- 1 root root 32071680 Aug 3 15:14 afcd.17741
-rw-r--r-- 1 root root 32067584 Aug 2 05:03 afcd.8740
-rw-r--r-- 1 root root 40325120 Aug 2 09:04 afcd.afcd!256.23078
-rw-r--r-- 1 root root 21032960 Aug 3 10:58 audld.plx.11321
-rw-r--r-- 1 root root 20889600 Aug 3 11:28 audld.plx.12455
-rw-r--r-- 1 root root 21032960 Aug 4 12:40 audld.plx.15685
-rw-r--r-- 1 root root 18448384 Aug 3 13:58 audld.plx.16995
-rw-r--r-- 1 root root 21032960 Aug 3 15:28 audld.plx.20666
-rw-r--r-- 1 root root 15732736 Aug 3 11:14 auisys.plx.11818
-rw-r--r-- 1 root root 12099584 Aug 3 03:11 auisys.plx.14428
-rw-r--r-- 1 root root 12234752 Aug 3 15:14 auisys.plx.19951
-rw-r--r-- 1 root root 12099584 Aug 3 15:43 auisys.plx.21185
-rw-r--r-- 1 root root 43560960 Aug 3 21:25 COMMAND.confd.plx.8394
-rw-r--r-- 1 root root 7032832 Aug 2 11:07 confd-client.pl.2632
-rw-r--r-- 1 root root 42528768 Aug 4 15:10 confd.plx.21266
-rw-r--r-- 1 root root 41721856 Aug 4 15:13 confd.plx.21362
-rw-r--r-- 1 root root 43704320 Aug 4 15:20 confd.plx.21894
-rw-r--r-- 1 root root 43724800 Aug 4 15:35 confd.plx.22503
-rw-r--r-- 1 root root 43704320 Aug 4 15:35 confd.plx.22511
-rw-r--r-- 1 root root 49852416 Aug 3 15:32 gen_inline_repo.20853
-rw-r--r-- 1 root root 48627712 Aug 3 06:47 gen_inline_repo.23232
-rw-r--r-- 1 root root 49684480 Aug 2 08:47 gen_inline_repo.25754
-rw-r--r-- 1 root root 49246208 Aug 2 17:47 gen_inline_repo.4894
-rw-r--r-- 1 root root 48799744 Aug 3 19:17 gen_inline_repo.9935
-rw-r--r-- 1 root root 778498048 Aug 2 11:07 httpproxy.8014
-rw-r--r-- 1 root root 930258944 Aug 2 02:21 httpproxy.EpollWorker_21.5358
-rw-r--r-- 1 root root 2826240 Aug 2 02:35 iptables-restor.18606
-rw-r--r-- 1 root root 146923520 Aug 2 11:57 mdw.plx.6310
-rw-r--r-- 1 root root 149790720 Aug 2 11:56 mdw.plx.6348
-rw-r--r-- 1 root root 143745024 Aug 2 11:58 mdw.plx.6859
-rw-r--r-- 1 root root 178814976 Aug 2 11:58 mdw.plx.7419
-rw-r--r-- 1 root root 109846528 Aug 2 12:13 mdw.plx.7954
-rw-r--r-- 1 root root 153108480 Aug 2 03:13 named.4735
-rw-r--r-- 1 root root 143044608 Aug 3 21:06 named.6806
-rw-r--r-- 1 root root 152023040 Aug 3 23:11 named.6830
-rw-r--r-- 1 root root 153604096 Aug 2 11:03 named.6917
-rw-r--r-- 1 root root 154099712 Aug 4 14:12 named.7043
-rw-r--r-- 1 root root 10940416 Aug 2 08:04 notifier.plx.23116
-rw-r--r-- 1 root root 1144180736 Aug 3 15:02 postgres.19595
-rw-r--r-- 1 root root 1217232896 Aug 3 05:56 postgres.20778
-rw-r--r-- 1 root root 1156878336 Aug 3 06:40 postgres.22927
-rw-r--r-- 1 root root 1217187840 Aug 3 08:53 postgres.28319
-rw-r--r-- 1 root root 1143132160 Aug 3 00:00 postgres.7288
-rw-r--r-- 1 root root 7749632 Aug 3 12:47 reverse-dns.plx.14950
-rw-r--r-- 1 root root 7892992 Aug 3 23:47 reverse-dns.plx.17095
-rw-r--r-- 1 root root 7749632 Aug 3 14:17 reverse-dns.plx.17900
-rw-r--r-- 1 root root 7749632 Aug 3 05:32 reverse-dns.plx.19826
-rw-r--r-- 1 root root 0 Aug 4 08:47 reverse-dns.plx.8751
-rw-r--r-- 1 root root 35102720 Aug 2 05:03 smtpd.bin.7544
-rw-r--r-- 1 root root 4980736 Aug 3 10:43 syslog-ng.7241
-rw-r--r-- 1 root root 9756672 Aug 2 23:30 system-reporter.28987
-rw-r--r-- 1 root root 10813440 Aug 2 18:20 system-reporter.7120

 



This thread was automatically locked due to age.
  • This is an unusual problem, so I suspect hardware.  The one hint I see in your logs is:

    2018:08:04-13:48:41 MASKED kernel: [ 54.462473] hwinfo: vm86 mode not supported on 64 bit kernel

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Sorry for the long absence. Thanks for your post...I think you were correct that it was a hardware problem, but not for the logging item you pointed to.  That log entry seems to be there (always?) if you pick a 64-bit install.  It is still there for my non-segfaulting replacement.

    I replaced the hardware with a Dell Optiplex i5 w/ an HP Intel Quad-NIC.  It works great without segfaulting...however, I have other issues that seem to persist since some unidentified update (all three systems could never remain stable on the WAN connection unless I stuck a cheapo Linksys router in between the FW and the Cambria Canopy modem).

    I had thought that the instability of the WAN connection was related to the other issues (segfaults and periodic system lockups)...but the replacement that does not have these issues still cannot establish a stable WAN connection.  I've gone through the Rulz to try every manual setting to no avail.  The lease occurs, and the service_monitor validates the link with some pings...then every 30 seconds or so, it flip-flops between being connected and having no connection.  The state remains up, but the link flip flops constantly.

    Put a $50 Linksys router in between, and it stays up 24/7 without issue.  Directly connect, and it flip-flops constantly.  This behavior existed on the previous two systems I used, though possibly not at the same rate. The first system had dual Realtek Gigabit LAN, the second system had a 6-port Intel Gigabit LAN, and the current system has an Intel Corporation 82579LM Gigabit Network Connection (Lewisville) for the WAN and Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) for the LAN ports.

    I've tried enabling/disabling Uplink Monitoring, using automatic and manual test addresses...the behavior of the flip-flop does not change.

    Anyone have ideas about the WAN stability?  I have seen many similar complaints, but most seem focused on DHCP...but this is (I think) different, since the leasing is working from the logs.

    Speaking of logs: I don't have them, since I can't use the internet when I have the firewall set up so the logs are being generated.  I will try to find a lull in my families needs to remove the Linksys and get the logs...I just posted this in hopes that someone may be able to point me in the right direction from the description.

    Thank you.

    Sam

  • Sam, did you try #7.7 in Rulz?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Bob, I did as far as I can actually go with not having access to the modem settings.  I've tried every setting on my end and rebooted the modem via the only control I have...the PoE injector.

    Right now, I'm still relying on the cheapo linksys router to sit between.  It bothers me to have the additional device when something that should theoretically be far superior can't seem to do what an old consumer router can with ease.

    Maybe I will try again to enable the DHCP on the external interface and promote it to connect directly to the modem (when I don't need reliable internet for a while...).

    Thanks!

    Sam