This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

confd segfault error 4 in libperl.so

On Sophos UTM 9.2, running on a quemu/libvirt Virtual Machine.

Question:

I'd like to know if restarting confd will resolve my issue or not, or if restarting confd will cause more issues.

/etc/rc.d/confd restart

The high-level error I am getting is in webadmin for perl, after kernel.log records a segfault in libperl.so:

hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at /</var/webadmin/webadmin.plx>core/modules/core_tools.pm line 711.

I would like to resolve this without rebooting the Sophos UTM. I cannot get a maintenance window scheduled soon enough, and need to add configuration to this Sophos UTM.

Summary:

The Sophos UTM has been running without issue for a long time. Then on Sept 25 there was a segfault event recorded in kernel.log. On Sept 26, confd [listener] appears to have started new (restarted). The parent confd [master] process has been running since system boot. Since Sept 28, backups fail to get created. Since Oct 31, logins to webadmin WebUI hang after successful authentication.

 

 

These are two issues are the problems I am experiencing in result of the confd.plx error.

(1)

A cron job makes nightly backups at 9:00pm. Sept 27 9:00pm was the last backup I received. 

Since Sept 28, I cannot create a new backups. I get the following error:

# backup.plx -o "cfg_20181101_085100.abf"
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
$VAR1 = [];
Could not get new backup from storage.

(2)

Since Oct 31, I cannot login to webadmin. I get the following error:

==> /var/log/webadmin.log <==
2018:11:01-08:26:34 sophos-01 webadmin[30244]: |=========================================================================
2018:11:01-08:26:34 sophos-01 webadmin[30244]: I JSON request processing started
2018:11:01-08:26:34 sophos-01 webadmin[30244]:
2018:11:01-08:26:34 sophos-01 webadmin[30244]: 1. main::top-level:184() asg.plx

==> /var/log/confd.log <==
2018:11:01-08:26:36 sophos-01 confd[30310]: I Role::authenticate:146() => id="3106" severity="info" sys="System" sub="confd" name="authentication successful" user="admin" srcip="10.10.10.22" sid="Wb000000000000000018gE" facility="webadmin" client="webadmin.plx" call="new"

==> /var/log/kernel.log <==
2018:11:01-08:26:36 sophos-01 kernel: [92726063.202567] confd.plx[30310]: segfault at 4367bcb4 ip b735ec80 sp bfd75780 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:36 sophos-01 kernel: [92726063.378456] confd.plx[30318]: segfault at 4367bcb4 ip b735ec80 sp bfd75780 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:38 sophos-01 kernel: [92726065.443752] confd.plx[30326]: segfault at 4367bcb4 ip b735ec80 sp bfd755f0 error 4 in libperl.so[b72b4000+14d000]
2018:11:01-08:26:38 sophos-01 kernel: [92726065.611573] confd.plx[30334]: segfault at 4367bcb4 ip b735ec80 sp bfd755f0 error 4 in libperl.so[b72b4000+14d000]

==> /var/log/webadmin.log <==
2018:11:01-08:26:38 sophos-01 webadmin[30244]: |=========================================================================
2018:11:01-08:26:38 sophos-01 webadmin[30244]: E [30244] DIED: hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this) at /</var/webadmin/webadmin.plx>core/modules/core_tools.pm line 711.
2018:11:01-08:26:38 sophos-01 webadmin[30244]:
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 1. main::__ANON__:73() asg.plx
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 2. core::modules::core_tools::obj2json:711() /</var/webadmin/webadmin.plx>core/modules/core_tools.pm
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 3. wfe::asg::modules::asg_cache::cache_objects:114() /</var/webadmin/webadmin.plx>wfe/asg/modules/asg_cache.pm
2018:11:01-08:26:38 sophos-01 webadmin[30244]: 4. main::top-level:265() asg.plx

 

This is the system information I collected

The "confd [master]" process has been running since system boot.

On Sept 26, the listener child process was started (restarted), though I cannot find any logs showing this.

root 7315 3110 0 Sep26 ? 00:13:21 confd [listener]

 

 

The sophos system started receiving segfault errors on Sept 25. Only confd started experiencing segfault. It was quite (no segfaults or other errors) before this.

==> /var/log/kernel <==
2018:09:25-00:13:09 sophos-01 kernel: [89499656.395398] confd.plx[28958]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
... snip - 8 events removed ...
2018:09:25-00:13:50 sophos-01 kernel: [89499697.465577] confd.plx[29030]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:25-00:13:53 sophos-01 kernel: [89499700.734961] show_signal_msg: 8 callbacks suppressed

The segfault errors continued through the time the system reports "confd [listener]" started (restarted).

2018:09:26-00:12:26 sophos-01 kernel: [89586013.288675] confd.plx[27639]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.394724] confd.plx[27698]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.407172] confd.plx[27696]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.452024] confd.plx[27702]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:26 sophos-01 kernel: [89586013.452679] confd.plx[27703]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:12:30 sophos-01 kernel: [89586017.723244] show_signal_msg: 3 callbacks suppressed
2018:09:26-00:12:30 sophos-01 kernel: [89586017.723249] confd.plx[27788]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:13:26 sophos-01 kernel: [89586073.660045] confd.plx[27804]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]
2018:09:26-00:13:26 sophos-01 kernel: [89586074.014605] confd.plx[27803]: segfault at 64ec0aec ip b734d944 sp bfd75a20 error 4 in li
bperl.so[b72b4000+14d000]

After these segfaults began occurring, starting on Sept 28 (when backups started failing to be created), repeating once a day on Sept 29, 30, Oct 8, Nov 11, the logs recorded segfaults for mdw (the middleware)

2018:09:28-23:33:43 sophos-01 kernel: [89842890.235323] mdw.plx[29991]: segfault at 4367bcb4 ip 4367bcb4 sp bf89ea4c error 4
2018:09:28-23:34:00 sophos-01 kernel: [89842907.309718] 8021q: adding VLAN 0 to HW filter on device eth0
2018:09:28-23:34:00 sophos-01 kernel: [89842907.396024] 8021q: adding VLAN 0 to HW filter on device eth8
2018:09:28-23:34:00 sophos-01 kernel: [89842907.456909] 8021q: adding VLAN 0 to HW filter on device eth1

There were few other segfaults during Sept 01 - Nov 01:

2018:09:23-12:08:35 sophos-01 kernel: [89369782.747679] ulogd[21364]: segfault at 0 ip b76cb103 sp b7329fb0 error 4 in libtcmalloc.so.4.1.0[b76a4000+48000]
2018:09:25-00:38:05 sophos-01 kernel: [89501152.603706] netselector.plx[32729]: segfault at fffffffe ip b7623711 sp bfbe8448 error 5 in libc-2.11.3.so[b75af000+167000]
2018:09:25-03:40:32 sophos-01 kernel: [89512099.104025] smtpd.bin[31429]: segfault at 793e338f ip b6732c27 sp bff73db0 error 4 in libperl.so[b667c000+150000]
2018:10:11-23:14:24 sophos-01 kernel: [90964931.335532] exim[30048]: segfault at 10c67b5 ip b774bbbd sp bfe99990 error 4 in ld-2.11.3.so[b7742000+1f000]

 



This thread was automatically locked due to age.
Parents
  • I installed a temporary sophos with trial license and tested out my scenario and configuration.

    In the end, the issue appears to be resolved after having restarted confd.

    /etc/rc.d/confd restart

     

    The confd process loads the license, and other configuration of the Sophos UTM to be served.

    Shutting down confd does not affect the run-time operation of the "Network Protection > Firewall" that was previously committed before confd kernel errors started occurring. It does essentially make webadmin WebUI not work when it is stopped. I did not test any other functionality.

    When stopping the confd, it wants to clear out any existing sessions (I assume every caller starts a session with confd to interface with it). After that, it is killed. Essentially `killproc confd.plx` does this all, or `/etc/rc.d/confd stop`. Once started again, processes reattach to confd with new sessions.

    I tested stopping the confd process, and then starting, on the test system. On my problem system, I opted to restart.

    It did take longer to stop the confd process than on the test system. And I saw a lot of sessions had to be cleared. But it did end up restating without error.

    The kernel errors have stopped. The WebUI is accessible again. And the backup is now working.

Reply
  • I installed a temporary sophos with trial license and tested out my scenario and configuration.

    In the end, the issue appears to be resolved after having restarted confd.

    /etc/rc.d/confd restart

     

    The confd process loads the license, and other configuration of the Sophos UTM to be served.

    Shutting down confd does not affect the run-time operation of the "Network Protection > Firewall" that was previously committed before confd kernel errors started occurring. It does essentially make webadmin WebUI not work when it is stopped. I did not test any other functionality.

    When stopping the confd, it wants to clear out any existing sessions (I assume every caller starts a session with confd to interface with it). After that, it is killed. Essentially `killproc confd.plx` does this all, or `/etc/rc.d/confd stop`. Once started again, processes reattach to confd with new sessions.

    I tested stopping the confd process, and then starting, on the test system. On my problem system, I opted to restart.

    It did take longer to stop the confd process than on the test system. And I saw a lot of sessions had to be cleared. But it did end up restating without error.

    The kernel errors have stopped. The WebUI is accessible again. And the backup is now working.

Children
No Data