This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos UTM Hangs at random intervals --problem with notifier/postfix reloading?

Hello All,

I've been very happy with the home use version of the Sophos UTM (current version 9.409) installed on bare metal since July 2015.  Thanks to all the experts participating here, I've had no previous problems I couldn't resolve by searching the posts at the former astaro site or here.  

However I haven't been able to resolve this problem.  In the last month or so I've started to experience hangs (since around the time of upgrading from 9.3x to 9.4x).  These hangs seem to occur at random intervals, but have occurred more frequently and with shorter intervals in the last week or two.  So far, each time access to the internet fails and when I try to log on to the web interface of the UTM it also fails to respond.  After a power switch reboot of the hardware, everything functions well again and I'm able to access the UTM through the web GUI and check the logs.

Nothing has caught my attention except in the Admin Notifications log; the notifier and the postfix/postfix script seem to be reloading every 5 to 20 minutes with the config version going up to as high as 894 before the hang, then after the restart the config version starts at 5.  Is this normal behavior?  I saw one post related to an earlier 8.x version of the UTM that mentioned postfix as part of the internal communications of the UTM.  

Thanks in advance for any help you can provide troubleshooting whether this might point to a hardware, software or combination problem or if it would be easier to reinstall and reconfigure the utm and wait to see if the problem reoccurs?

Cheers,
PD

Excerpt from the Admin Notifications log:

2017:01:24-17:12:27 wall notifier[3606]: loading config version 892
2017:01:24-17:12:30 wall postfix/postfix-script[29316]: refreshing the Postfix mail system
2017:01:24-17:12:30 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-17:23:31 wall notifier[3606]: loading config version 893
2017:01:24-17:29:33 wall notifier[3606]: loading config version 894
2017:01:24-17:29:35 wall postfix/postfix-script[29967]: refreshing the Postfix mail system
2017:01:24-17:29:35 wall postfix/master[5740]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:03 wall postfix/postfix-script[5530]: stopping the Postfix mail system
2017:01:24-18:02:03 wall postfix/master[3592]: terminating on signal 15
2017:01:24-18:02:04 wall postfix/postfix-script[5869]: starting the Postfix mail system
2017:01:24-18:02:04 wall postfix/master[5871]: daemon started -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:02:10 wall notifier[5989]: processing notification request for INFO-000<30>Jan 24 18:02:10 notifier[5989]: mail notifications for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: snmp traps for INFO-000 are disabled
2017:01:24-18:02:10 wall notifier[5989]: successfully processed request for notification
2017:01:24-18:02:45 wall notifier[3610]: loading config version 5<30>Jan 24 18:08:56 notifier[6576]: processing notification request for INFO-005
2017:01:24-18:08:56 wall notifier[6576]: mail notifications for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: snmp traps for INFO-005 are disabled
2017:01:24-18:08:56 wall notifier[6576]: successfully processed request for notification
2017:01:24-18:16:48 wall notifier[3610]: loading config version 10
2017:01:24-18:16:50 wall postfix/postfix-script[7165]: refreshing the Postfix mail system
2017:01:24-18:16:50 wall postfix/master[5871]: reload -- version 2.11.0, configuration /etc/postfix
2017:01:24-18:26:51 wall notifier[3610]: loading config version 21
2017:01:24-18:26:53 wall postfix/postfix-script[8306]: refreshing the Postfix mail system


This thread was automatically locked due to age.
  • Sachin,

     

    Thanks for the suggestion on rebuilding the postgres database.  Based on your feedback and Bob's, I started to think about the hangs occurring more frequently with shorter uptime in between (even after a complete reinstall and configuration of the UTM software).

    I've made an educated guess that the SSD I was using was starting to have errors and it may have been corrupting the postgres database.

    I am currently trying a fresh install of 9.411 on a new SSD and I am not seeing the same rate of "postfix/postfix script" version reloads so far.  I also saw that the old SSD apparently had some errors in partitions 5 and 6, but I haven't figured out the best way to more thoroughly test the old SSD yet.

    If this new install goes without a hang for more than 10 days, then I'll be reasonably certain that a failing SSD may have been the problem.  If the UTM does start to hang again, I'll try to rebuild the postgres database using your instructions.

    Thanks again for your help.  Cheers, PD

  • Hello All,

    I think I can safely say that these hangs were due to a failing solid state drive.  I replaced the suspect drive, reinstalled and reconfigured with UTM Software 9.411 and I have not experienced any hangs in the last 16 days.  Apparently, frequent reloads of the postfix configuration are normal.  Below is an excerpt of how high the version number has incremented so far without any hangs:

    2017:03:23-22:25:30 wall notifier[3580]: loading config version 5558
    2017:03:23-22:27:31 wall notifier[3580]: loading config version 5559
    2017:03:23-22:27:33 wall postfix/postfix-script[4626]: refreshing the Postfix mail system
    2017:03:23-22:27:33 wall postfix/master[5658]: reload -- version 2.11.0, configuration /etc/postfix
    2017:03:23-22:27:47 wall notifier[4637]: processing notification request for INFO-005
    2017:03:23-22:27:47 wall notifier[4637]: snmp traps for INFO-005 are disabled
    2017:03:23-22:27:47 wall notifier[4637]: successfully processed request for notification



    Thanks to everyone for your help in trying to resolve these hangs, but it seems that the root cause was a failing drive.

    Cheers, PD