This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos UTM Hangs at random intervals --problem with notifier/postfix reloading?

Hello All,

I've been very happy with the home use version of the Sophos UTM (current version 9.409) installed on bare metal since July 2015.  Thanks to all the experts participating here, I've had no previous problems I couldn't resolve by searching the posts at the former astaro site or here.  

However I haven't been able to resolve this problem.  In the last month or so I've started to experience hangs (since around the time of upgrading from 9.3x to 9.4x).  These hangs seem to occur at random intervals, but have occurred more frequently and with shorter intervals in the last week or two.  So far, each time access to the internet fails and when I try to log on to the web interface of the UTM it also fails to respond.  After a power switch reboot of the hardware, everything functions well again and I'm able to access the UTM through the web GUI and check the logs.

Nothing has caught my attention except in the Admin Notifications log; the notifier and the postfix/postfix script seem to be reloading every 5 to 20 minutes with the config version going up to as high as 894 before the hang, then after the restart the config version starts at 5.  Is this normal behavior?  I saw one post related to an earlier 8.x version of the UTM that mentioned postfix as part of the internal communications of the UTM.  

Thanks in advance for any help you can provide troubleshooting whether this might point to a hardware, software or combination problem or if it would be easier to reinstall and reconfigure the utm and wait to see if the problem reoccurs?

Cheers,
PD

Excerpt from the Admin Notifications log:

2017:01:24-17:12:27 wall notifier[3606]: loading config version 892
2017:01:24-17:12:30 wall postfix/postfix-script[29316]: refreshing the Postfix mail system
2017:01:24-17:12:30 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-17:23:31 wall notifier[3606]: loading config version 893
2017:01:24-17:29:33 wall notifier[3606]: loading config version 894
2017:01:24-17:29:35 wall postfix/postfix-script[29967]: refreshing the Postfix mail system
2017:01:24-17:29:35 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:03 wall postfix/postfix-script[5530]: stopping the Postfix mail system
2017:01:24-18:02:03 wall postfix/master[3592]: terminating on signal 15
2017:01:24-18:02:04 wall postfix/postfix-script[5869]: starting the Postfix mail system
2017:01:24-18:02:04 wall postfix/master[5871]: daemon started -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:10 wall notifier[5989]: processing notification request for INFO-000<30>Jan 24 18:02:10 notifier[5989]: mail notifications for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: snmp traps for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: successfully processed request for notification
2017:01:24-18:02:45 wall notifier[3610]: loading config version 5<30>Jan 24 18:08:56 notifier[6576]: processing notification request for INFO-005
2017:01:24-18:08:56 wall notifier[6576]: mail notifications for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: snmp traps for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: successfully processed request for notification
2017:01:24-18:16:48 wall notifier[3610]: loading config version 10
2017:01:24-18:16:50 wall postfix/postfix-script[7165]: refreshing the Postfix mail system
2017:01:24-18:16:50 wall postfix/master[5871]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:26:51 wall notifier[3610]: loading config version 21
2017:01:24-18:26:53 wall postfix/postfix-script[8306]: refreshing the Postfix mail system


This thread was automatically locked due to age.
  • Hi PD,

    Before taking any further steps, please upload the backup taken from the previous version again and monitor the behavior. Let us know the outcome meanwhile, I will take a look at the logs and check if the issue is known internally.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Thanks Sachingurung, I will attempt that again with an earlier backup. I had forgot to mention that I did try one restore from a backup and then I made sure all settings related to email and notifications were disabled within the web GUI. However, the hangs continued to occur. It may be a few days before I can try another restore. I'll monitor this thread for any updates and I'll let you know the results of the second restore attempt.
  • Sachingurung and others,

     

    I tried another restore from a backup made while on Sophos UTM version 9.407-3 made on 11/26/16.  I then de-selected or turned off all features within the web gui that related to notifications and email.  Unfortunately, the same behavior is continuing with the notifier/postfix script reloading very frequently as logged in the Admin Notifications log:

    Log Excerpt 1 from 2nd Restore Trial:

    2017:01:29-18:56:02 wall notifier[3616]: loading config version 229
    2017:01:29-19:05:04 wall notifier[3616]: loading config version 230
    2017:01:29-19:05:06 wall postfix/postfix-script[30274]: refreshing the Postfix mail system
    2017:01:29-19:05:06 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix
    2017:01:29-19:15:07 wall notifier[3616]: loading config version 231
    2017:01:29-19:15:09 wall postfix/postfix-script[30632]: refreshing the Postfix mail system
    2017:01:29-19:15:09 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix
    2017:01:29-19:16:08 wall notifier[3616]: loading config version 232

    Log Excerpt 2 from 2nd Restore Trial:

    2017:01:30-16:29:20 wall postfix/postfix-script[23150]: refreshing the Postfix mail system
    2017:01:30-16:29:20 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix
    2017:01:30-16:42:21 wall notifier[3616]: loading config version 483
    2017:01:30-16:44:22 wall notifier[3616]: loading config version 484
    2017:01:30-16:44:24 wall postfix/postfix-script[23758]: refreshing the Postfix mail system
    2017:01:30-16:44:24 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix
    2017:01:30-16:53:26 wall notifier[3616]: loading config version 485
    2017:01:30-16:53:28 wall postfix/postfix-script[24132]: refreshing the Postfix mail system
    2017:01:30-16:53:28 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix

    Thanks in advance for any updates on whether this behavior is normal.  Otherwise please let me know if it would be best to reformat the drive and reinstall 9.409 and reconfigure.

  • Hi, and welcome to the UTM Community!

    What happens if you just restore the 11/26 configuration without changing it?  Does the UTM hang?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Bob,

    Thanks for your input.  I have not tried restoring the 11/26 configuration without any changes to notifications.  However, my restore of the 11/26 configuration with all notifications deselected within the web gui has not hung in 4 days 18 hours even though the notifications/postfix version has risen up to 1051 (see log excerpt below).  If it lasts more than a week, I'll start trying to reestablish limited notifications.  I hope Sachin is able to find out whether this is normal behavior for the notifier.

    Cheers,

    PD

    Admin Notification Log Excerpt:

    2017:02:02-15:44:26 wall notifier[3616]: loading config version 1050
    2017:02:02-21:44:29 wall postfix/postfix-script[10505]: refreshing the Postfix mail system
    2017:02:02-15:44:29 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix
    2017:02:02-15:48:00 wall notifier[10707]: processing notification request for WARN-005
    2017:02:02-15:48:00 wall notifier[10707]: mail notifications for WARN-005 are disabled
    2017:02:02-15:48:00 wall notifier[10707]: snmp traps for WARN-005 are disabled
    2017:02:02-15:48:00 wall notifier[10707]: successfully processed request for notification
    2017:02:02-15:48:17 wall notifier[10717]: processing notification request for INFO-005
    2017:02:02-15:48:17 wall notifier[10717]: mail notifications for INFO-005 are disabled
    2017:02:02-15:48:17 wall notifier[10717]: snmp traps for INFO-005 are disabled
    2017:02:02-15:48:17 wall notifier[10717]: successfully processed request for notification
    2017:02:02-15:48:28 wall notifier[3616]: loading config version 1051
    2017:02:02-21:48:30 wall postfix/postfix-script[10769]: refreshing the Postfix mail system
    2017:02:02-15:48:30 wall postfix/master[5712]: reload -- version 2.11.0, configuration /etc/postfix

  • The reason Sachin asked you to try the restore is that it happens that an Up2Date mangles the upgrade of a configuration.  I'll guess it happens much less than 1 in a thousand times, but it's not so rare that it's not worth making that the first step if things seem funny after an Up2Date.  The next thing to try is a reboot.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Sachin, Bob and Others,

     

    This problem has reoccurred even after a complete wipe of the ssd and reinstall and reconfiguration of UTM version 9.411-3.1   I still need to know if this postfix behavior is normal or abnormal and I would appreciate any ideas or tips of what to troubleshoot next.  Any hints to whether this is an issue with the UTM software or whether an underlying hardware issue could be causing this problem?  Bob, could this be related to a past postfix bug referenced in this post from 2011: "[8.285][BUG][OPEN] Postfix fatal errors"

    Here again is an excerpt of the admin notifications log showing the frequent reloading of the postfix/postfix script system with a hang at the reload time stamp 12:58:38 until a power switch restart.

    2017:03:05-12:56:42 wall notifier[3580]: loading config version 393
    2017:03:05-12:56:43 wall notifier[3580]: loading config version 394
    2017:03:05-12:56:45 wall postfix/postfix-script[25138]: refreshing the Postfix mail system
    2017:03:05-12:56:45 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    2017:03:05-12:58:14 wall postfix/postfix-script[25278]: refreshing the Postfix mail system
    2017:03:05-12:58:14 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    2017:03:05-12:58:22 wall notifier[3580]: loading config version 396
    2017:03:05-12:58:25 wall notifier[25390]: processing notification request for INFO-306
    2017:03:05-12:58:25 wall notifier[25390]: mail notifications for INFO-306 are disabled
    2017:03:05-12:58:25 wall notifier[25390]: snmp traps for INFO-306 are disabled
    2017:03:05-12:58:25 wall notifier[25390]: successfully processed request for notification
    2017:03:05-12:58:38 wall postfix/postfix-script[25481]: refreshing the Postfix mail system
    2017:03:05-12:58:38 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    ����������������������������������������������������������������������������������������������������������������������������������������2017:03:05-13:27:04 wall postfix/postfix-script[5407]: stopping the Postfix mail system
    2017:03:05-13:27:04 wall postfix/master[3603]: terminating on signal 15
    2017:03:05-13:27:05 wall postfix/postfix-script[5701]: starting the Postfix mail system

    Thanks in advance to anyone who can help with this issue. Cheers, PD
  • Over a period of 5-to-6 days, our lab UTM loads new config versions up to 1999 and then starts over at 11.  It doesn't die though.

    I don't think this is related to the problem of six years ago.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi,

    I think that is somewhere associated with the postgres database. What happens when you rebuild postgres? Run the following command as root:

    /etc/init.d/postgresql92 rebuild

    Note: This will delete the current information in graphs and reporting but, it will not affect any of the archive logs.

    Hope that helps.

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Bob,

    Thanks for that information.  Based on your feedback that frequent config version reloads do not cause your lab UTM to hang and Sachin's about rebuilding the postgres database I started to think about the hangs occurring more frequently with shorter uptime in between (even after a complete reinstall and configuration of the UTM software).

    I'm making an educated guess that the SSD I was using was starting to have errors and it may have been corrupting the postgres database.

    I am currently trying a fresh install of 9.411 on a new SSD and I am not seeing the same rate of "postfix/postfix script" version reloads so far.  I also saw that the old SSD apparently had some errors in partitions 5 and 6.  If this new install goes without a hang for more than 10 days, then I'll be reasonably certain that a failing SSD may have been the problem.

    Thanks again for your help.  Cheers, PD