This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Internet uplink monitoring


We have have a primary and secondary ISP connection at the office and configured uplink monitoring to automatically switch between them. In general the primary ISP line has been working well for several months (since Jan 2021) but during the last days we had several short "uplink is down/up" states (for ~1-2minutes) which wos obviously long enough for the UTM monitoring function to notice an outage and engage the backup line.

Our ISP is monitoring the connection from his side and said his system didn't notice any outages. Now they even installed "smaokeping" for our connection to have a more intensive monitoring granularity. However I want't to give them one or two days for getting some statistics before I ask them about the last days outages.

On the other side I'm a bit uncertain if the automatic UTM monitoring could be faulty right now and generate false positives and indicate a "down state" which might be just because some of thir reference servers can not be reached in a certain time (due to internet congestion).

Did someone else experience similar uplink uptages while the automatic monitoring?

Who knows which serves & services are being used for automatic monitoring?

Is there any recommendation which servers should be used for uplink monitoring in case I decide to use my own?

How does the uplink monitoring decide about a down state? When all servers become unreachable or when a certain percantage can't be reached?

In order to get a third opinion I also thought about installing a RasPi/PC with smokeping outside of our Sophos gateway directly on the ISP router/modem (public ip addresses are available) and compare the smokeping statiscics with the detected uplink downs - or does that sound overdone?



This thread was automatically locked due to age.
  • Option 1: Uplink monitoring; monitors the fw's connection to the root DNS servers.  When that in interrupted then it considers the link down.  However this might not actually be true.  there could be an upstream routing issue that causes the connectivity to fail and as such its not the link itself but an upstream issue.  This is a good way to do things but also presents you with the issue you have.

    Option 2: The other way to do it is to force monitoring of the carriers first hop.  this will give you a "true" link down indication but will NOT tell you if there is an upstream issue.

    Now the reality is what you want to prevent is users not being able to get to the web and so option 1 above is really best for that.  Option 2 is best to see if the actual link is down but functionally doesn't help when the link i9s not down but users cant get to sites...

    Hope this helps...

  • Hallo Chris,

    I think it's when all servers are unreachable.  At one client site, I had to use 8.8.4.4 for awhile as their location seemed to have an upstream issue that made the root name servers too slow to respond.  If you want to continue after getting Lee's and my comments, ...

    On the 'Uplink Balancing' tab, click on the wrench icon and show us a picture of the Scheduler settings.  Also, a picture of any relevant Multipath rule(s).

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA