This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos UTM Hangs at random intervals --problem with notifier/postfix reloading?

Hello All,

I've been very happy with the home use version of the Sophos UTM (current version 9.409) installed on bare metal since July 2015.  Thanks to all the experts participating here, I've had no previous problems I couldn't resolve by searching the posts at the former astaro site or here.  

However I haven't been able to resolve this problem.  In the last month or so I've started to experience hangs (since around the time of upgrading from 9.3x to 9.4x).  These hangs seem to occur at random intervals, but have occurred more frequently and with shorter intervals in the last week or two.  So far, each time access to the internet fails and when I try to log on to the web interface of the UTM it also fails to respond.  After a power switch reboot of the hardware, everything functions well again and I'm able to access the UTM through the web GUI and check the logs.

Nothing has caught my attention except in the Admin Notifications log; the notifier and the postfix/postfix script seem to be reloading every 5 to 20 minutes with the config version going up to as high as 894 before the hang, then after the restart the config version starts at 5.  Is this normal behavior?  I saw one post related to an earlier 8.x version of the UTM that mentioned postfix as part of the internal communications of the UTM.  

Thanks in advance for any help you can provide troubleshooting whether this might point to a hardware, software or combination problem or if it would be easier to reinstall and reconfigure the utm and wait to see if the problem reoccurs?

Cheers,
PD

Excerpt from the Admin Notifications log:

2017:01:24-17:12:27 wall notifier[3606]: loading config version 892
2017:01:24-17:12:30 wall postfix/postfix-script[29316]: refreshing the Postfix mail system
2017:01:24-17:12:30 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-17:23:31 wall notifier[3606]: loading config version 893
2017:01:24-17:29:33 wall notifier[3606]: loading config version 894
2017:01:24-17:29:35 wall postfix/postfix-script[29967]: refreshing the Postfix mail system
2017:01:24-17:29:35 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:03 wall postfix/postfix-script[5530]: stopping the Postfix mail system
2017:01:24-18:02:03 wall postfix/master[3592]: terminating on signal 15
2017:01:24-18:02:04 wall postfix/postfix-script[5869]: starting the Postfix mail system
2017:01:24-18:02:04 wall postfix/master[5871]: daemon started -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:10 wall notifier[5989]: processing notification request for INFO-000<30>Jan 24 18:02:10 notifier[5989]: mail notifications for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: snmp traps for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: successfully processed request for notification
2017:01:24-18:02:45 wall notifier[3610]: loading config version 5<30>Jan 24 18:08:56 notifier[6576]: processing notification request for INFO-005
2017:01:24-18:08:56 wall notifier[6576]: mail notifications for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: snmp traps for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: successfully processed request for notification
2017:01:24-18:16:48 wall notifier[3610]: loading config version 10
2017:01:24-18:16:50 wall postfix/postfix-script[7165]: refreshing the Postfix mail system
2017:01:24-18:16:50 wall postfix/master[5871]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:26:51 wall notifier[3610]: loading config version 21
2017:01:24-18:26:53 wall postfix/postfix-script[8306]: refreshing the Postfix mail system


This thread was automatically locked due to age.
Parents
  • Sachin, Bob and Others,

     

    This problem has reoccurred even after a complete wipe of the ssd and reinstall and reconfiguration of UTM version 9.411-3.1   I still need to know if this postfix behavior is normal or abnormal and I would appreciate any ideas or tips of what to troubleshoot next.  Any hints to whether this is an issue with the UTM software or whether an underlying hardware issue could be causing this problem?  Bob, could this be related to a past postfix bug referenced in this post from 2011: "[8.285][BUG][OPEN] Postfix fatal errors"

    Here again is an excerpt of the admin notifications log showing the frequent reloading of the postfix/postfix script system with a hang at the reload time stamp 12:58:38 until a power switch restart.

    2017:03:05-12:56:42 wall notifier[3580]: loading config version 393
    2017:03:05-12:56:43 wall notifier[3580]: loading config version 394
    2017:03:05-12:56:45 wall postfix/postfix-script[25138]: refreshing the Postfix mail system
    2017:03:05-12:56:45 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    2017:03:05-12:58:14 wall postfix/postfix-script[25278]: refreshing the Postfix mail system
    2017:03:05-12:58:14 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    2017:03:05-12:58:22 wall notifier[3580]: loading config version 396
    2017:03:05-12:58:25 wall notifier[25390]: processing notification request for INFO-306
    2017:03:05-12:58:25 wall notifier[25390]: mail notifications for INFO-306 are disabled
    2017:03:05-12:58:25 wall notifier[25390]: snmp traps for INFO-306 are disabled
    2017:03:05-12:58:25 wall notifier[25390]: successfully processed request for notification
    2017:03:05-12:58:38 wall postfix/postfix-script[25481]: refreshing the Postfix mail system
    2017:03:05-12:58:38 wall postfix/master[5783]: reload -- version 2.11.0, configuration /etc/postfix
    ����������������������������������������������������������������������������������������������������������������������������������������2017:03:05-13:27:04 wall postfix/postfix-script[5407]: stopping the Postfix mail system
    2017:03:05-13:27:04 wall postfix/master[3603]: terminating on signal 15
    2017:03:05-13:27:05 wall postfix/postfix-script[5701]: starting the Postfix mail system

    Thanks in advance to anyone who can help with this issue. Cheers, PD
  • Hi,

    I think that is somewhere associated with the postgres database. What happens when you rebuild postgres? Run the following command as root:

    /etc/init.d/postgresql92 rebuild

    Note: This will delete the current information in graphs and reporting but, it will not affect any of the archive logs.

    Hope that helps.

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Sachin,

     

    Thanks for the suggestion on rebuilding the postgres database.  Based on your feedback and Bob's, I started to think about the hangs occurring more frequently with shorter uptime in between (even after a complete reinstall and configuration of the UTM software).

    I've made an educated guess that the SSD I was using was starting to have errors and it may have been corrupting the postgres database.

    I am currently trying a fresh install of 9.411 on a new SSD and I am not seeing the same rate of "postfix/postfix script" version reloads so far.  I also saw that the old SSD apparently had some errors in partitions 5 and 6, but I haven't figured out the best way to more thoroughly test the old SSD yet.

    If this new install goes without a hang for more than 10 days, then I'll be reasonably certain that a failing SSD may have been the problem.  If the UTM does start to hang again, I'll try to rebuild the postgres database using your instructions.

    Thanks again for your help.  Cheers, PD

Reply
  • Sachin,

     

    Thanks for the suggestion on rebuilding the postgres database.  Based on your feedback and Bob's, I started to think about the hangs occurring more frequently with shorter uptime in between (even after a complete reinstall and configuration of the UTM software).

    I've made an educated guess that the SSD I was using was starting to have errors and it may have been corrupting the postgres database.

    I am currently trying a fresh install of 9.411 on a new SSD and I am not seeing the same rate of "postfix/postfix script" version reloads so far.  I also saw that the old SSD apparently had some errors in partitions 5 and 6, but I haven't figured out the best way to more thoroughly test the old SSD yet.

    If this new install goes without a hang for more than 10 days, then I'll be reasonably certain that a failing SSD may have been the problem.  If the UTM does start to hang again, I'll try to rebuild the postgres database using your instructions.

    Thanks again for your help.  Cheers, PD

Children
No Data