This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Weekly Complete Outage

Hoping someone may have experienced this and found a fix. We have a single customer that is having a regular problem. Doesn't seem to happen for any rhyme or reason. Basically all traffic ceases to process through the UTM. The Web interface is not accessible. Their VoIP phones and everything goes down. Their ISP's router shows up and is remotely accessible. Physical reboot of Sophos resolves the issue.

We've:

1. RMA'd the UTM
2. Updated to latest firmware

Sophos can't seem to figure out the issue. They think the new unit has a memory problem (potentially, they don't know and want us to go onsite to run a memtest). Kind of doubtful since it was just RMA'd with the same issue. Was hoping someone may have seen this before.



This thread was automatically locked due to age.
  • Ideas

    - Time sync problem causes wrong time, causes license to be invalid

    - Disk partition filling up.

  • One more idea:

    I wonder if UTM needs to check in with Sophos on a regular basis, to re-validate its license, and they have a configuration rule that prevents it from happening.   I don't know much about the license enforcement logic, so this may be nonsense.   In this scenario, the device stops working because the license grace period expires after a week.

  • Similar story.  Might help.

    A bit different circumstances, but we had a similar issue with a new install for a client - also doing Internet and VoIP.  Failure anywhere from 5 to 45 days apart.  Random times; weekdays and weekends - daytime and nighttime.  We found a "symptom" in the Sophos logs pretty quickly.  We could see the upstream DNS fail and the UTM would drop off line, no matter the DNS source, and soon enough the WAN side wouldn't work.  Similar to your situation, a restart of the WAN side, or a reboot of the UTM, would bring it back up immediately.  We thought about a script to restart automatically, but instead taught a couple of people how to fix it quick. 

    This install is in a 20 story high rise building.  We had the ISP swap out their en-suite router.  In our case, we mostly build our own UTM boxes.  We went through 3 of them, including (intentionally) different hardware the third time, etc.  Nothing helped.  Sophos couldn't find a problem either.  We did lots of logging and discussing with Sophos and the ISP.

    We became convinced that it was something on the ISP side - perhaps other electronics in the building? There were too many subscribers for our client to be the only one suffering this problem.  And then, one day, we checked the logs and noticed that the client hadn't had the problem in quite some time.  After many months, it had magically gone away.  Part of the issue with this large, national ISP is that we were never sure that the higher level changes we wanted (we wanted all new circuits, etc.) were installed.  So, to this day we're not exactly sure what was done to fix it, but obviously something, and it wasn't on our side.

    Ultimately we re-installed the first UTM we built for this client.  It's been working fine ever since with no problems.

    I'd keep after your ISP.

  • Welcome to the new UTM Community!

    I agree with eganders that it's an ISP issue, but you might be able to "outsmart" them.  I've seen this situation several times, and the workaround you've used confirms that you might be able to use my "standard" solution.  See #7.7 in Rulz.  If that doesn't solve your problem, you will definitely want to hammer on your ISP.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA