Hello,
yes it's always the same Mail:
Content:
-------------------------------------------------
Network Monitor not running - restarted
--
System Uptime : 133 days 6 hours 1 minute
System Load : 0.23
System Version : Sophos UTM 9.408-4
Please refer to the manual for detailed instructions.
-------------------------------------------------
Didn't try that update now, but since we are facing this Problem since about 5 months now, I don't think the "latest" Update
will cause any change...
best,
daniel.
The first thing I would try is to restore a configuration backup made before this problem started. If that stops the issue, then you can conclude that some process, maybe an Up2Date, damaged the configuration database. You will need to decide if you want to modify the old configuration or restore the new one and wait for Sophos Support to fix this issue. I don't expect that that will have been the problem though and that you'll need to follow Scott Klassen's prescription...
If that didn't work, Sophos Support will surely ask you to re-image your SG, but you will want them to have a look at your SG first. The following process will delete all of your logs, reporting and graphs, so be sure to get what you need before trying this:
Please let us know your result.
Cheers - Bob
Thanks for your reply.
Restoring "Backups" before this Problem is not an option. This would remove about 5 months of configuration done on the unit,
while the outcome is undetermined.
Just noted that the appliance is running since 130 days, which almost matches the date when the problems started.
So, I'll try a "simple" reboot, first and postback tomorrow.
Make a new backup before you try the restore. After the old version gives you your answer, restore the new backup you just made.
Trying the reboot is a great idea!
Cheers - Bob
Sorry, I'm not going to try the "restore".
My customer is NOT paying for the throwaway of about 40 hours of work that have been put into configuration overall - just to get rid of some unwanted Emails!
There IS a predictable reason why Emails are send by the system. (At least I hope so) - so there MUSST be a reliable way to fix the Problem without doing some childish' reset (or set-back) of the whole system!
Remember: We are not talking about some sort of "Free-Ware" here, we are talking about an appliance worth a serious amount of bucks.
But let's wait, what the reboot will cause. :-)
best,
daniel.
Thanks for being the guinea pig in the above, Daniel. In fact, the very same thing happened for no reason on our UTM. Beginning at 2017-04-11 04:33, the UTM began sending the notification every hour or so. A reboot first thing this AM resolved the issue.
Cheers - Bob
Service Monitor not running - restarted
I had this problem appear on version - 9.506-2 Pattern version - 135858
It was also present on the previous version. A reboot and a disk check have failed to
resolve the issue. The "service Monitor" service is stopping around every 30 minutes.
The log indicates that the service stops when a reverse DNS is attempted by the system
or a check that the target responds to an ICMP ping. The service then terminates.
This is an exert from my "service monitor" log.
..............................................................................................................................
2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Starting real server checker with 17 threads" 2017:11:29-00:02:05 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Open ICMPv4 socket" 2017:11:29-00:02:05 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Open ICMPv6 socket" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTrsSecurTime ICMP 46.101.55.10 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTrsSecurTime to 46.101.55.10" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 90.207.238.97 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 90.207.238.97" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 8.8.8.8 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 208.67.222.222 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 208.67.222.123 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaMultiDnsResol ICMP 8.8.8.8 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaMultiDnsResol to 8.8.8.8" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaMultiDnsResol ICMP 90.207.238.97 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 90.207.238.99 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTrsSecurTime ICMP 192.146.137.13 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 8.8.4.4 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 208.67.220.220 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaSecurDnsResol ICMP 208.67.220.123 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaMultiDnsResol ICMP 8.8.4.4 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaMultiDnsResol ICMP 90.207.238.99 changed state to ONLINE" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTrsSecurTime to 192.146.137.13" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 8.8.8.8" 2017:11:29-00:02:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaMultiDnsResol to 90.207.238.97" 2017:11:29-00:02:06 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.222" 2017:11:29-00:02:06 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaMultiDnsResol to 90.207.238.97" 2017:11:29-00:02:06 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.123" 2017:11:29-00:02:06 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaMultiDnsResol to 90.207.238.97" 2017:11:29-00:02:06 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.123" 2017:11:29-00:02:07 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.123" 2017:11:29-00:02:07 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.123" 2017:11:29-00:02:08 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaSecurDnsResol to 208.67.222.123" 2017:11:29-00:02:10 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTrsSecurTime ICMP 95.215.175.2 changed state to OFFLINE" 2017:11:29-00:02:10 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTrsSecurTime to 192.146.137.13" 2017:11:29-00:02:10 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTrsSecurTime ICMP 139.143.5.31 changed state to OFFLINE" 2017:11:29-00:02:10 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTrsSecurTime ICMP 139.143.5.30 changed state to OFFLINE" 2017:11:29-00:02:10 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTrsSecurTime to 192.146.137.13" 2017:11:29-00:02:11 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTrsSecurTime to 192.146.137.13" 2017:11:29-00:07:05 router service_monitor[30184]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Exiting..." 2017:11:29-00:07:05 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3599" 2017:11:29-00:07:06 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3587" 2017:11:29-00:07:07 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3587" 2017:11:29-00:07:07 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3587" 2017:11:29-00:07:08 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3587" 2017:11:29-00:07:09 router service_monitor[30184]: id="4002" severity="info" sys="System" sub="loadbalancing" name="Waiting for thread 3587"
The next entry is the service restarting...
2017:11:29-00:07:10 router service_monitor[30526]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Starting real server checker with 17 threads"
I am going to try changing the NTP and DNS clients but I doubt that will make any difference.
A rather irritating problem and it appears I am not alone in having this issue. Has anyone found a solution to this problem?
Stuart.
Update.....
I changed the hdd in the system and reloaded from the latest image. System status is...
Firmware version:9.506-2
Pattern version:136417
Configured with one lan interface and one wan interface.
Standard NAT configured and firewall rule of Any ----> Any ----> Any
No other services active not even DHCP.
That and I only just realised I posted the wrong log file entries. Oops [:$]
This is the correct log AFAIK....
2017:12:11-08:34:35 router selfmonng[3935]: I check Failed increment service_monitor_running counter 1 - 3 2017:12:11-08:34:40 router selfmonng[3935]: I check Failed increment service_monitor_running counter 2 - 3 2017:12:11-08:34:45 router selfmonng[3935]: W check Failed increment service_monitor_running counter 3 - 3 2017:12:11-08:34:45 router selfmonng[3935]: [INFO-181] Service Monitor not running - restarted 2017:12:11-08:34:45 router selfmonng[3935]: W NOTIFYEVENT Name=service_monitor_running Level=INFO Id=181 sent 2017:12:11-08:34:45 router selfmonng[3935]: W triggerAction: 'cmd' 2017:12:11-08:34:45 router selfmonng[3935]: W actionCmd(+): '/var/mdw/scripts/service_monitor restart' 2017:12:11-08:34:45 router selfmonng[3935]: W child returned status: exit='0' signal='0'
I noticed this on the " Up to Date Log " .....
2017:12:11-08:28:11 router audld[2722]: Could not connect to Authentication Server 79.125.21.244 (code=500 500 Internal Server Error). 2017:12:11-08:28:20 router audld[2722]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful" 2017:12:11-08:43:01 router audld[4029]: no HA system or cluster node 2017:12:11-08:43:01 router audld[4029]: Starting Up2Date Package Downloader 2017:12:11-08:43:02 router audld[4029]: patch up2date possible 2017:12:11-08:43:18 router audld[4029]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful"
I dont know if its related but this coincides with a NTP update ????
These emails are still coming in every 30 minutes to an hour and are starting to bear a close relationship with the mother in-law.
Any idea's anyone ?
!! I used to think I was indecisive but now I am not so sure !!
Update.....
I changed the hdd in the system and reloaded from the latest image. System status is...
Firmware version:9.506-2
Pattern version:136417
Configured with one lan interface and one wan interface.
Standard NAT configured and firewall rule of Any ----> Any ----> Any
No other services active not even DHCP.
That and I only just realised I posted the wrong log file entries. Oops [:$]
This is the correct log AFAIK....
2017:12:11-08:34:35 router selfmonng[3935]: I check Failed increment service_monitor_running counter 1 - 3 2017:12:11-08:34:40 router selfmonng[3935]: I check Failed increment service_monitor_running counter 2 - 3 2017:12:11-08:34:45 router selfmonng[3935]: W check Failed increment service_monitor_running counter 3 - 3 2017:12:11-08:34:45 router selfmonng[3935]: [INFO-181] Service Monitor not running - restarted 2017:12:11-08:34:45 router selfmonng[3935]: W NOTIFYEVENT Name=service_monitor_running Level=INFO Id=181 sent 2017:12:11-08:34:45 router selfmonng[3935]: W triggerAction: 'cmd' 2017:12:11-08:34:45 router selfmonng[3935]: W actionCmd(+): '/var/mdw/scripts/service_monitor restart' 2017:12:11-08:34:45 router selfmonng[3935]: W child returned status: exit='0' signal='0'
I noticed this on the " Up to Date Log " .....
2017:12:11-08:28:11 router audld[2722]: Could not connect to Authentication Server 79.125.21.244 (code=500 500 Internal Server Error). 2017:12:11-08:28:20 router audld[2722]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful" 2017:12:11-08:43:01 router audld[4029]: no HA system or cluster node 2017:12:11-08:43:01 router audld[4029]: Starting Up2Date Package Downloader 2017:12:11-08:43:02 router audld[4029]: patch up2date possible 2017:12:11-08:43:18 router audld[4029]: id="3701" severity="info" sys="system" sub="up2date" name="Authentication successful"
I dont know if its related but this coincides with a NTP update ????
These emails are still coming in every 30 minutes to an hour and are starting to bear a close relationship with the mother in-law.
Any idea's anyone ?
!! I used to think I was indecisive but now I am not so sure !!
After the latest update it stopped until... I updated a week later the second machine (HA).
Now it's all the same, and I'm getting 5-6 "Network Monitor not running – restarted" every day.
I have to say that Sophos team are not much of a help here.
I'm having this bug for nearly a year now.
For my opinion, it's unacceptable that a big name like Sophos, during all those months, couldn't find and fix this issue, or at least send me an RPM for resolving this issue.
That is rally annoying, and all I get from Sophos is that they thanks me for my patient.
Come on guys, i'm sure yo could do better?
Goldy, out of the dozens of UTMs I've worked on and from which I still receive notifications, I'm not seeing this from any of them. I checked the logs since 1/1/2018.
Cheers - Bob