This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Firewall lost connection to SUM - no apparent reason. Anyone got tshooting advice?

SUM version:  4.305-7
SG125 version: 9.505-4

I've been using SUM for about 2 years now and I've never seen it do this. We have ~20 or so Sophos SGs in our SUM portal. The firmware revisions have been on both devices for some time now without any hiccups, however, as of this morning one of them (just one) is claiming it is offline. It isn't offline, it is up and running functioning fine. When I log into the device I see this error in the 'Central Management' area:

[1] SUM SSL-connect: 'IO::Socket::INET6 configuration failed'.

Under Interfaces & Routing IPv6 is disabled (globally).

Looking at the 'Device Agent' log this has been repeating over and over and over (public addresses have been sanitized with XYZ.XYZ.XYZ.XYZ):

2018:02:28-00:00:05 xyzINC device-agent[5477]:   1 is not connected. Trying to connect
2018:02:28-00:00:05 xyzINC device-agent[5477]:   Updating SUM IP address for path: acc/server1/server
2018:02:28-00:00:05 xyzINC device-agent[5477]:   [1] Connecting to SUM (ip=XYZ.XYZ.XYZ.XYZ), port=4433).
2018:02:28-00:00:05 xyzINC device-agent[5477]:   [1] Using SUM SSL connection.
2018:02:28-00:00:08 xyzINC device-agent[5477]:   [1] SUM connection failure, retrying (ip=XYZ.XYZ.XYZ.XYZ), port=4433). SSL-connect: 'IO::Socket::INET6 configuration failed'
2018:02:28-00:00:11 xyzINC device-agent[5477]:   [1] SUM connection failure, retrying (ip=XYZ.XYZ.XYZ.XYZ), port=4433). SSL-connect: 'IO::Socket::INET6 configuration failed'
2018:02:28-00:00:12 xyzINC device-agent[5477]:   [1] Connection failed (ip=XYZ.XYZ.XYZ.XYZ), port=4433).
2018:02:28-00:00:12 xyzINC device-agent[5477]:   Not reporting inotify: no role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 1 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 2 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 3 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 4 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 5 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 6 not executing: denied by role
2018:02:28-00:00:12 xyzINC device-agent[5477]:   timer2 -> module 7 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 1 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 2 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 3 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 4 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 5 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 6 not executing: denied by role
2018:02:28-00:00:17 xyzINC device-agent[5477]:   timer2 -> module 7 not executing: denied by role

I've tried disabling and reenabling the central management on the firewall, produces the same errors.

I'm pretty certain a reboot is going to fix it. I have a strong suspicion the reason this is happening is I was testing an ipsec vpn tunnel between our offices (same public IP the sum is located at) and this firewall (very strong suspicion because I cannot ping the public IP used by SUM from the SG125, I can however ping other IPs on that same block-- all are configured on a Sophos at our offices. I cannot even ping that address from a machine behind the firewall, I can, however, browse to the webadmin listening on that ip if I browse to https://XYZ.XYZ.XYZ.XYZ:4444. Going by the timestamp reported by SUM for the router being 'offline' it is at about the same time I was working on this VPN tunnel test.)

This is a production network and it is very difficult for us to coordinate downtime, does anyone have any pointers that may help restore connectivity without having to bring the entire firewall down?

 

Thanks



This thread was automatically locked due to age.