This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

confd segfault error 4 in libperl.so

On Sophos UTM 9.2, running on a quemu/libvirt Virtual Machine.

Question:

I'd like to know if restarting confd will resolve my issue or not, or if restarting confd will cause more issues.

/etc/rc.d/confd restart

The high-level error I am getting is in webadmin for perl, after kernel.log records a segfault in libperl.so:

hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at /</var/webadmin/webadmin.plx>core/modules/core_tools.pm line 711.

I would like to resolve this without rebooting the Sophos UTM. I cannot get a maintenance window scheduled soon enough, and need to add configuration to this Sophos UTM.

Summary:

The Sophos UTM has been running without issue for a long time. Then on Sept 25 there was a segfault event recorded in kernel.log. On Sept 26, confd [listener] appears to have started new (restarted). The parent confd [master] process has been running since system boot. Since Sept 28, backups fail to get created. Since Oct 31, logins to webadmin WebUI hang after successful authentication.

 

 

These are two issues are the problems I am experiencing in result of the confd.plx error.

(1)

A cron job makes nightly backups at 9:00pm. Sept 27 9:00pm was the last backup I received. 

Since Sept 28, I cannot create a new backups. I get the following error:

# backup.plx -o "cfg_20181101_085100.abf"
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
Could not get new backup from storage.

(2)

Since Oct 31, I cannot login to webadmin. I get the following error:

==> /var/log/webadmin.log <==
2018:11:01-08:26:34 sophos-01 webadmin[30244]: |=========================================================================
2018:11:01-08:26:34 sophos-01 webadmin[30244]: I JSON request processing started
2018:11:01-08:26:34 sophos-01 webadmin[30244]:
2018:11:01-08:26:34 sophos-01 webadmin[30244]: 1. main::top-level:184() asg.plx

==> /var/log/confd.log <==
2018:11:01-08:26:36 sophos-01 confd[30310]: I Role::authenticate:146() => id="3106" severity="info" sys="System" sub="confd" name="authentication successful" user="admin" srcip="10.10.10.22" sid="Wb000000000000000018gE" facility="webadmin" client="webadmin.plx" call="new"

==> /var/log/kernel.log <==
2018:11:01-08:26:36 sophos-01 kernel: [92726063.202567] confd.plx[30310]: segfault at 4367bcb4 ip b735ec80 sp bfd75780 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:36 sophos-01 kernel: [92726063.378456] confd.plx[30318]: segfault at 4367bcb4 ip b735ec80 sp bfd75780 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:38 sophos-01 kernel: [92726065.443752] confd.plx[30326]: segfault at 4367bcb4 ip b735ec80 sp bfd755f0 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:38 sophos-01 kernel: [92726065.611573] confd.plx[30334]: segfault at 4367bcb4 ip b735ec80 sp bfd755f0 error 4 in libperl.so[b72b4000+14d000]

==> /var/log/webadmin.log <==
2018:11:01-08:26:38 sophos-01 webadmin[30244]: |=========================================================================
2018:11:01-08:26:38 sophos-01 webadmin[30244]: E [30244] DIED: hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at /</var/webadmin/webadmin.plx>core/modules/core_tools.pm line 711.
2018:11:01-08:26:38 sophos-01 webadmin[30244]:
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 1. main::__ANON__:73() asg.plx
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 2. core::modules::core_tools::obj2json:711() /</var/webadmin/webadmin.plx>core/modules/core_tools.pm
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 3. wfe::asg::modules::asg_cache::cache_objects:114() /</var/webadmin/webadmin.plx>wfe/asg/modules/asg_cache.pm
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 4. main::top-level:265() asg.plx

 

This is the system information I collected

The "confd [master]" process has been running since system boot.

On Sept 26, the listener child process was started (restarted), though I cannot find any logs showing this.

root 7315 3110 0 Sep26 ? 00:13:21 confd [listener]

 

 

The sophos system started receiving segfault errors on Sept 25. Only confd started experiencing segfault. It was quite (no segfaults or other errors) before this.

==> /var/log/kernel <==
2018:09:25-00:13:09 sophos-01 kernel: [89499656.395398] confd.plx[28958]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
... snip - 8 events removed ...
2018:09:25-00:13:50 sophos-01 kernel: [89499697.465577] confd.plx[29030]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:25-00:13:53 sophos-01 kernel: [89499700.734961] show_signal_msg: 8 callbacks suppressed

The segfault errors continued through the time the system reports "confd [listener]" started (restarted).

2018:09:26-00:12:26 sophos-01 kernel: [89586013.288675] confd.plx[27639]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.394724] confd.plx[27698]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.407172] confd.plx[27696]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.452024] confd.plx[27702]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.452679] confd.plx[27703]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:30 sophos-01 kernel: [89586017.723244] show_signal_msg: 3 callbacks suppressed
2018:09:26-00:12:30 sophos-01 kernel: [89586017.723249] confd.plx[27788]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:13:26 sophos-01 kernel: [89586073.660045] confd.plx[27804]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:13:26 sophos-01 kernel: [89586074.014605] confd.plx[27803]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]

After these segfaults began occurring, starting on Sept 28 (when backups started failing to be created), repeating once a day on Sept 29, 30, Oct 8, Nov 11, the logs recorded segfaults for mdw (the middleware)

2018:09:28-23:33:43 sophos-01 kernel: [89842890.235323] mdw.plx[29991]: segfault at 4367bcb4 ip 4367bcb4 sp bf89ea4c error 4
2018:09:28-23:34:00 sophos-01 kernel: [89842907.309718] 8021q: adding VLAN 0 to HW filter on device eth0
2018:09:28-23:34:00 sophos-01 kernel: [89842907.396024] 8021q: adding VLAN 0 to HW filter on device eth8
2018:09:28-23:34:00 sophos-01 kernel: [89842907.456909] 8021q: adding VLAN 0 to HW filter on device eth1

There were few other segfaults during Sept 01 - Nov 01:

2018:09:23-12:08:35 sophos-01 kernel: [89369782.747679] ulogd[21364]: segfault at 0 ip b76cb103 sp b7329fb0 error 4 in libtcmalloc.so.4.1.0[b76a4000+48000]
2018:09:25-00:38:05 sophos-01 kernel: [89501152.603706] netselector.plx[32729]: segfault at fffffffe ip b7623711 sp bfbe8448 error 5 in libc-2.11.3.so[b75af000+167000]
2018:09:25-03:40:32 sophos-01 kernel: [89512099.104025] smtpd.bin[31429]: segfault at 793e338f ip b6732c27 sp bff73db0 error 4 in libperl.so[b667c000+150000]
2018:10:11-23:14:24 sophos-01 kernel: [90964931.335532] exim[30048]: segfault at 10c67b5 ip b774bbbd sp bfe99990 error 4 in ld-2.11.3.so[b7742000+1f000]

 



This thread was automatically locked due to age.
  • I installed a temporary sophos with trial license and tested out my scenario and configuration.

    In the end, the issue appears to be resolved after having restarted confd.

    /etc/rc.d/confd restart

     

    The confd process loads the license, and other configuration of the Sophos UTM to be served.

    Shutting down confd does not affect the run-time operation of the "Network Protection > Firewall" that was previously committed before confd kernel errors started occurring. It does essentially make webadmin WebUI not work when it is stopped. I did not test any other functionality.

    When stopping the confd, it wants to clear out any existing sessions (I assume every caller starts a session with confd to interface with it). After that, it is killed. Essentially `killproc confd.plx` does this all, or `/etc/rc.d/confd stop`. Once started again, processes reattach to confd with new sessions.

    I tested stopping the confd process, and then starting, on the test system. On my problem system, I opted to restart.

    It did take longer to stop the confd process than on the test system. And I saw a lot of sessions had to be cleared. But it did end up restating without error.

    The kernel errors have stopped. The WebUI is accessible again. And the backup is now working.

  • I encountered a new libperl.so kernel error, on the same Sophos UTM unit.

    This time the unit reverted the webadmin UI back to the initialization page. The unit was still working as a firewall, as configured, but the initialization screen that is accessed the first time a Sophos UTM is install, was showing. Also backups stopped working too.

    I ssh into the Sophos UTM Virtual Machine and, as before, restarted confd. This time when I restarted confd (which does a stop and start), the whole box reverted to unconfigured, immediately upon confd's startup. Several of the interfaces either went offline, or the confd restart made it difficult to access them if it changed some configuration.

    I was lucky that one of the interfaces was still accessible. It was a route between this unit and another Sophos UTM unit. The unit was still showing the initialization Web UI page.

    So, reluctantly, I went through that wizard to "setup" the box. And I uploaded a backup when I got to the wizard page for restoring from a backup.

     

    So, the lesson learned is when the unit thinks it is unconfigured, do not restart confd.

    I don't know what the best resolution is for this situation. Especially if I do not want to reboot the box.

    Perhaps this error is different from the previous one.

    Anyone have any ideas why this occurred, or what the best resolution is?

     

    /var/log/kernel.log

    2018:11:12-01:00:46 sophos-1 kernel: [93653313.923355] confd.plx[3746]: segfault at 3ff00000 ip 403dcd38 sp bfd66e78 error 4 in libperl.so[4034e000+14d000]

    This set of errors were being logged 3 times per second in /vaqr/log/confd.log

    2018:11:11-14:30:32 sophos-1 confd[10893]: >=========================================================================
    2018:11:11-14:30:32 sophos-1 confd[10893]: E Use of uninitialized value $newlang in string eq at /</var/confd/confd.plx>Message.pm line 1026.
    2018:11:11-14:30:32 sophos-1 confd[10893]:
    2018:11:11-14:30:32 sophos-1 confd[10893]: 1. sys::_warn:54() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 2. Message::err_set:1026() /</var/confd/confd.plx>Message.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 3. Role::authenticate:133() /</var/confd/confd.plx>Role.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 4. sys::new:188() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 5. (eval):1() (eval 582)
    2018:11:11-14:30:32 sophos-1 confd[10893]: 6. Astaro::RPC::server_loop:185() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 7. rpc::launch:50() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 8. main::_rpc:2189() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: 9. main::top-level:394() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: |=========================================================================
    2018:11:11-14:30:32 sophos-1 confd[10893]: E Use of uninitialized value $newlang in string ne at /</var/confd/confd.plx>Message.pm line 1030.
    2018:11:11-14:30:32 sophos-1 confd[10893]:
    2018:11:11-14:30:32 sophos-1 confd[10893]: 1. sys::_warn:54() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 2. Message::err_set:1026() /</var/confd/confd.plx>Message.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 3. Role::authenticate:133() /</var/confd/confd.plx>Role.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 4. sys::new:188() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 5. (eval):1() (eval 582)
    2018:11:11-14:30:32 sophos-1 confd[10893]: 6. Astaro::RPC::server_loop:185() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 7. rpc::launch:50() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 8. main::_rpc:2189() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: 9. main::top-level:394() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: |=========================================================================
    2018:11:11-14:30:32 sophos-1 confd[10893]: E Use of uninitialized value $format_language in concatenation (.) or string at /</var/confd/confd.plx>Message.pm line 926.
    2018:11:11-14:30:32 sophos-1 confd[10893]:
    2018:11:11-14:30:32 sophos-1 confd[10893]: 1. sys::_warn:54() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 2. Message::parse_formats:926() /</var/confd/confd.plx>Message.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 3. Message::err_set:1026() /</var/confd/confd.plx>Message.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 4. Role::authenticate:133() /</var/confd/confd.plx>Role.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 5. sys::new:188() /</var/confd/confd.plx>sys.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 6. (eval):1() (eval 582)
    2018:11:11-14:30:32 sophos-1 confd[10893]: 7. Astaro::RPC::server_loop:185() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 8. rpc::launch:50() /</var/confd/confd.plx>rpc.pm
    2018:11:11-14:30:32 sophos-1 confd[10893]: 9. main::_rpc:2189() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: 10. main::top-level:394() confd.pl
    2018:11:11-14:30:32 sophos-1 confd[10893]: <=========================================================================
    2018:11:11-14:30:32 sophos-1 confd[10893]: W Message::err_set:1063() => id="3100" severity="warn" sys="System" sub="confd" name="ROLE_ACCESS_DENIED (Cannot authenticate user system, access denied.)" user="anonymous" srcip="127.0.0.1" facility="system" client="admin-reporter.pl" call="new" user_name="system"

     

     

     

  • This Sophos VM continued to have the same kernel error - 5 days later this issue occurred again. It appears the root issue may be related to a corrupted storage for confd. I saw this error complaining of corrupted storage during the first incident, and I thought going back through the setup would have recreated or reinitialized the storage area. But, apparently that is not the case.

    The final resolution for this problem was to install a new copy of the latest Sophos on a new Virtual Machine.

    This unit having issues is Sophos v9.2 on a VM. I created an identical VM with Sophos v9.5 to replace it. For my procedure, there was no downtime.

    Here were my steps:

    1. Upload a Sophos backup of the v9.2 VM to new v9.5 VM
    2. Restore the v9.2 backup on the new v9.5 VM
    3. Wait 15 seconds for interfaces to begin coming up on the v9.5 VM
    4. Pause the old v9.2 VM (using the Virtual Machine Pause)

     There might have been a small window of broadcast storms. I did see a very short spike in network latency during the transfer. But I did not notice any significant issues or service outages.